Skip to main content
Learning DUB

The GROUP BY Clause specifies the groups into which output rows are to be placed. It also determines if aggregate functions are included in the SELECT clause (select list), and calculates a summary value for each group.

In other words, whenever the SELECT clause of your SELECT statement includes column values and aggregate functions, you must include the GROUP BY clause. The GROUP BY clause must list all of the column values used in the SELECT clause.

The GROUP BY keywords are followed by a list of columns, known as the grouping columns. The GROUP BY clause restricts the rows of the result set. When a GROUP BY clause is used, each item in the select list must produce a single value for each group. There is only one row for each distinct value in the grouping column or columns. Each result set row contains summary data related to the specific value in its grouping columns. Null values in one item are placed in one group.

A simple GROUP BY clause

The following example retrieves the total for each SalesOrderID from the SalesOrderDetail table.

A.

--
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail AS sod
GROUP BY SalesOrderID
ORDER BY SalesOrderID;
GO

-- Retrieve the number of employees in each city
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
GROUP BY City;

The GROUP BY clause should only be used when there is one or more aggregate functions (e.g., COUNT, MAX, MIN, SUM, AVG) in the SELECT clause.

GROUP BY Clause

The GROUP BY clause is used to group the output of the WHERE clause.

GROUP BY and the WHERE Clause 445

The WHERE clause is used to filter the rows that result from the operations specified in the FROM clause.

Choosing Rows with the HAVING Clause 500

The HAVING clause is used to filter rows from the grouped result.

GROUP BY and ALL 715

GROUP BY and Null Values 842

Use a GROUP BY clause with multiple tables

The following example retrieves the number of employees for each City from the Address table joined to the EmployeeAddress table. This example uses AdventureWorks.

B.

--
SELECT a.City, COUNT(bea.AddressID) EmployeeCount
FROM Person.BusinessEntityAddress AS bea
INNER JOIN Person.Address AS a
ON bea.AddressID = a.AddressID
GROUP BY a.City
ORDER BY a.City;
GO

Use a GROUP BY clause with an expression

The following example retrieves the total sales for each year by using the DATEPART function. The same expression must be present in both the SELECT list and GROUP BY clause.

C.

--
SELECT DATEPART(yyyy,OrderDate) AS N'Year'
,SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
ORDER BY DATEPART(yyyy,OrderDate);
GO

Use a GROUP BY clause with a HAVING clause

The HAVING clause is used to filter grouped data. For example, the following example uses the HAVING clause to specify which of the groups generated in the GROUP BY clause should be included in the result set.

D.

--
SELECT DATEPART(yyyy,OrderDate) AS N'Year'
,SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
HAVING DATEPART(yyyy,OrderDate) >= N'2003'
ORDER BY DATEPART(yyyy,OrderDate);
GO

/* Retrieve the number of employees in each city in which there are at least 2 employees. */
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
GROUP BY City
HAVING COUNT(EmployeeID) > 1;

Order of Clauses
1. SELECT
2. FROM
3. WHERE
4. GROUP BY
5. HAVING
6. ORDER BY

Here is an example of a query using every clause listed above:

/* Find the number of sales representatives in each city that contains at least 2 sales representatives. Order by the number of employees. */
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
WHERE Title = 'Sales Representative'
GROUP BY City
HAVING COUNT(EmployeeID) > 1
ORDER BY NumEmployees;

Understanding the correct sequence in which the WHERE, GROUP BY, and HAVING clauses are applied helps in coding efficient queries:
The WHERE clause is used to filter the rows that result from the operations specified in the FROM clause.
The GROUP BY clause is used to group the output of the WHERE clause.
The HAVING clause is used to filter rows from the grouped result.

For any search conditions that could be applied either before or after the grouping operation, it is more efficient to specify them in the WHERE clause. This reduces the number of rows that have to be grouped. The only search conditions that should be specified in the HAVING clause are those search conditions that must be applied after the grouping operation has been performed.

The following query shows HAVING with an aggregate function. It groups the rows in the titles table by type and eliminates the groups that include only one book.

Grouping Rules you must know

1. Every non-aggregate column that appears in the SELECT clause must also appear in the GROUP BY clause.
2. You may not use aliases in the HAVING clause. MySQL allows usage of aliases in the HAVING clause, but you may want to avoid this to keep your code as cross-database compatible as possible.
3. You may use aliases in the ORDER BY clause.
4. To filter rows based on aggregate functions, use the HAVING clause.
5. You should declare column aliases for any calculated fields in the SELECT clause.
6. You may not use aggregate functions in the WHERE clause.

/////////////////////////////////////////////
How to Group Rows with GROUP BY

When used without a GROUP BY clause, aggregate functions report only one aggregate value for a SELECT statement. The GROUP BY clause is used to produce aggregate values for each row in the result set.

This example returns the number of units sold for each product in category 2:

SELECT OrdD.ProductID AS ProdID,
SUM(OrdD.Quantity) AS AmountSold
FROM [Order Details] AS OrdD JOIN Products as Prd
ON OrdD.ProductID = Prd.ProductID
AND Prd.CategoryID = 2
GROUP BY OrdD.ProductID

Here is the result set:

ProdID AmountSold
3 328
4 453
5 298
6 301
8 372
15 122
44 601
61 603
63 445
65 745
66 239
77 791
** (12 row(s) affected)
The GROUP BY clause contains the following components:

One or more aggregate-free expressions. These are usually references to the grouping columns.
Optionally, the ALL keyword, which specifies that all groups produced by the GROUP BY clause are returned, even if some of the groups do not have any rows that meet the search conditions.
CUBE or ROLLUP.
The HAVING clause is typically used in conjunction with the GROUP BY clause, although HAVING can be specified separately.

You can group by an expression as long as it does not include aggregate functions, for example:

SELECT DATEPART(yy, HireDate) AS Year,
COUNT(*) AS NumberOfHires
FROM Northwind.dbo.Employees
GROUP BY DATEPART(yy, HireDate)

In a GROUP BY, you must specify the name of a table or view column, not the name of a result set column assigned with an AS clause. For example, replacing the GROUP BY DATEPART(yy, HireDate) clause earlier with GROUP BY Year is not legal. You can list more than one column in the GROUP BY clause to nest groups; that is, you can group a table by any combination of columns. For example, this query finds the average price and the sum of year-to-date sales, grouped by type and publisher ID:

USE pubs
SELECT type, pub_id, 'avg' = AVG(price), 'sum' = sum(ytd_sales)
FROM titles
GROUP BY type, pub_id

Here is the result set:

type pub_id avg avg sum
business 0736 11.96 18722
business 1389 17.31 12066
mod_cook 0877 11.49 24278
popular_comp 1389 21.48 12875
psychology 0736 45.93 9564
psychology 0877 21.59 375
trad_cook 0877 15.96 19566
UNDECIDED 0877 (null) (null)
** (8 row(s) affected)

USE pubs
SELECT type
FROM titles
GROUP BY type
HAVING COUNT(*) > 1

**
Here is the result set: type ------------ business mod_cook popular_comp psychology trad_cook (5 row(s) affected) This is an example of a HAVING clause without aggregate functions. It groups the rows in titles by type and eliminates those types that do not start with the letter p.

USE pubs
SELECT type
FROM titles
GROUP BY type
HAVING type LIKE 'p%'

Here is the result set: type ------------ popular_comp psychology (2 row(s) affected)

When multiple conditions are included in HAVING, they are combined with AND, OR, or NOT. The following example shows how to group titles by publisher, including only those publishers with identification numbers greater than 0800, who have paid more than $15,000 in total advances, and who sell books for an average of less than $20.

SELECT pub_id, SUM(advance) AS AmountAdvanced,
AVG(price) AS AveragePrice
FROM pubs.dbo.titles
WHERE pub_id > '0800'
GROUP BY pub_id
HAVING SUM(advance) > $15000
AND AVG(price) < $20

ORDER BY can be used to order the output of a GROUP BY clause. This example shows using the ORDER BY clause to define the order in which the rows from a GROUP BY clause are returned:

SELECT pub_id, SUM(advance) AS AmountAdvanced,
AVG(price) AS AveragePrice
FROM pubs.dbo.titles
WHERE pub_id > '0800'
AND price >= $5
GROUP BY pub_id
HAVING SUM(advance) > $15000
AND AVG(price) < $20
ORDER BY pub_id DESC

Related Content

Scroll to Top