Skip to main content
Learning DUB

The GROUP BY Clause is used to group data by column values and specifies the groups into which output rows are to be placed. It also determines if aggregate functions are included in the SELECT clause (select list), and calculates a summary value for each group.

In other words, whenever the SELECT clause of your SELECT statement includes column values and aggregate functions, you must include the GROUP BY clause. The GROUP BY clause must list all of the column values used in the SELECT clause.

The GROUP BY keywords are followed by a list of columns, known as the grouping columns. The GROUP BY clause restricts the rows of the result set. When a GROUP BY clause is used, each item in the select list must produce a single value for each group. There is only one row for each distinct value in the grouping column or columns. Each result set row contains summary data related to the specific value in its grouping columns. Null values in one item are placed in one group.

A simple GROUP BY clause

The following example retrieves the total for each SalesOrderID from the SalesOrderDetail table.

A.

--
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail AS sod
GROUP BY SalesOrderID
ORDER BY SalesOrderID;
GO

-- Retrieve the number of employees in each city
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
GROUP BY City;

The GROUP BY clause should only be used when there is one or more aggregate functions (e.g., COUNT, MAX, MIN, SUM, AVG) in the SELECT clause.

Use a GROUP BY clause with multiple tables

The following example retrieves the number of employees for each City from the Address table joined to the EmployeeAddress table. This example uses AdventureWorks.

B.

--
SELECT a.City, COUNT(bea.AddressID) EmployeeCount
FROM Person.BusinessEntityAddress AS bea
INNER JOIN Person.Address AS a
ON bea.AddressID = a.AddressID
GROUP BY a.City
ORDER BY a.City;
GO

Use a GROUP BY clause with an expression

The following example retrieves the total sales for each year by using the DATEPART function. The same expression must be present in both the SELECT list and GROUP BY clause.

C.

--
SELECT DATEPART(yyyy,OrderDate) AS N'Year'
,SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
ORDER BY DATEPART(yyyy,OrderDate);
GO

Use a GROUP BY clause with a HAVING clause

The HAVING clause is used to filter grouped data. For example, the following example uses the HAVING clause to specify which of the groups generated in the GROUP BY clause should be included in the result set.

D.

--
SELECT DATEPART(yyyy,OrderDate) AS N'Year'
,SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
HAVING DATEPART(yyyy,OrderDate) >= N'2003'
ORDER BY DATEPART(yyyy,OrderDate);
GO

/* Retrieve the number of employees in each city in which there are at least 2 employees. */
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
GROUP BY City
HAVING COUNT(EmployeeID) > 1;

Order of Clauses
1. SELECT
2. FROM
3. WHERE
4. GROUP BY
5. HAVING
6. ORDER BY

Here is an example of a query using every clause listed above:

/* Find the number of sales representatives in each city that contains at least 2 sales representatives. Order by the number of employees. */
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
WHERE Title = 'Sales Representative'
GROUP BY City
HAVING COUNT(EmployeeID) > 1
ORDER BY NumEmployees;

Grouping Rules you must know

1. Every non-aggregate column that appears in the SELECT clause must also appear in the GROUP BY clause.
2. You may not use aliases in the HAVING clause. MySQL allows usage of aliases in the HAVING clause, but you may want to avoid this to keep your code as cross-database compatible as possible.
3. You may use aliases in the ORDER BY clause.
4. To filter rows based on aggregate functions, use the HAVING clause.
5. You should declare column aliases for any calculated fields in the SELECT clause.
6. You may not use aggregate functions in the WHERE clause.

How to Group Rows with GROUP BY

When used without a GROUP BY clause, aggregate functions report only one aggregate value for a SELECT statement. The GROUP BY clause is used to produce aggregate values for each row in the result set.

This example returns the number of units sold for each product in category 2:

SELECT OrdD.ProductID AS ProdID,
SUM(OrdD.Quantity) AS AmountSold
FROM [Order Details] AS OrdD JOIN Products as Prd
ON OrdD.ProductID = Prd.ProductID
AND Prd.CategoryID = 2
GROUP BY OrdD.ProductID

Here is the result set:
ProdID AmountSold
3 328
4 453
5 298
6 301
8 372
15 122
44 601
61 603
63 445
65 745
66 239
77 791
** (12 row(s) affected)
The GROUP BY clause contains the following components:

One or more aggregate-free expressions. These are usually references to the grouping columns.
Optionally, the ALL keyword, which specifies that all groups produced by the GROUP BY clause are returned, even if some of the groups do not have any rows that meet the search conditions.
CUBE or ROLLUP.
The HAVING clause is typically used in conjunction with the GROUP BY clause, although HAVING can be specified separately.

You can group by an expression as long as it does not include aggregate functions, for example:

SELECT DATEPART(yy, HireDate) AS Year,
COUNT(*) AS NumberOfHires
FROM Northwind.dbo.Employees
GROUP BY DATEPART(yy, HireDate)

In a GROUP BY, you must specify the name of a table or view column, not the name of a result set column assigned with an AS clause. For example, replacing the GROUP BY DATEPART(yy, HireDate) clause earlier with GROUP BY Year is not legal. You can list more than one column in the GROUP BY clause to nest groups; that is, you can group a table by any combination of columns. For example, this query finds the average price and the sum of year-to-date sales, grouped by type and publisher ID:

USE pubs
SELECT type, pub_id, 'avg' = AVG(price), 'sum' = sum(ytd_sales)
FROM titles
GROUP BY type, pub_id

Here is the result set:
type pub_id avg avg sum
business 0736 11.96 18722
business 1389 17.31 12066
mod_cook 0877 11.49 24278
popular_comp 1389 21.48 12875
psychology 0736 45.93 9564
psychology 0877 21.59 375
trad_cook 0877 15.96 19566
UNDECIDED 0877 (null) (null)
** (8 row(s) affected)
How to GROUP BY using the WHERE Clause

You can use a WHERE clause in a query containing a GROUP BY clause. Rows not meeting the conditions in the WHERE clause are eliminated before any grouping is done, for example:

USE pubs
SELECT type, AVG(price)
FROM titles
WHERE advance > $5000
GROUP BY type

Here is the result set:
Only rows with advances greater than $5,000 are included in the groups shown in the query results.

type
business 2.99
mod_cook 2.99
popular_comp 21.48
psychology 14.30
trad_cook 17.97
** (5 row(s) affected)
///////////// Delete Below ///////////////////
How to Choose Rows with the HAVING Clause

The HAVING clause sets conditions on the GROUP BY clause similar to the way WHERE interacts with SELECT. The WHERE search condition is applied before the grouping operation occurs; the HAVING search condition is applied after the grouping operation occurs. The HAVING syntax is exactly like the WHERE syntax, except HAVING can contain aggregate functions. HAVING clauses can reference any of the items that appear in the select list.

This query finds publishers who have had year-to-date sales greater than $40,000:

USE pubs
SELECT pub_id, total = SUM(ytd_sales)
FROM titles
GROUP BY pub_id
HAVING SUM(ytd_sales) > 40000

Here is the result set: pub_id total ------ ----------- 0877 44219 (1 row(s) affected)

To make sure there are at least six books involved in the calculations for each publisher, this example uses HAVING COUNT(*) > 5 to eliminate the publishers that return totals for fewer than six books:

USE pubs
SELECT pub_id, total = SUM(ytd_sales)
FROM titles
GROUP BY pub_id
HAVING COUNT(*) > 5

Here is the result set: pub_id total ------ ----------- 0877 44219 1389 24941 (2 row(s) affected)

Understanding the correct sequence in which the WHERE, GROUP BY, and HAVING clauses are applied helps in coding efficient queries:
The WHERE clause is used to filter the rows that result from the operations specified in the FROM clause.
The GROUP BY clause is used to group the output of the WHERE clause.
The HAVING clause is used to filter rows from the grouped result.

For any search conditions that could be applied either before or after the grouping operation, it is more efficient to specify them in the WHERE clause. This reduces the number of rows that have to be grouped. The only search conditions that should be specified in the HAVING clause are those search conditions that must be applied after the grouping operation has been performed.

The following query shows HAVING with an aggregate function. It groups the rows in the titles table by type and eliminates the groups that include only one book.

USE pubs
SELECT type
FROM titles
GROUP BY type
HAVING COUNT(*) > 1

**
Here is the result set: type ------------ business mod_cook popular_comp psychology trad_cook (5 row(s) affected) This is an example of a HAVING clause without aggregate functions. It groups the rows in titles by type and eliminates those types that do not start with the letter p.

USE pubs
SELECT type
FROM titles
GROUP BY type
HAVING type LIKE 'p%'

Here is the result set: type ------------ popular_comp psychology (2 row(s) affected)

When multiple conditions are included in HAVING, they are combined with AND, OR, or NOT. The following example shows how to group titles by publisher, including only those publishers with identification numbers greater than 0800, who have paid more than $15,000 in total advances, and who sell books for an average of less than $20.

SELECT pub_id, SUM(advance) AS AmountAdvanced,
AVG(price) AS AveragePrice
FROM pubs.dbo.titles
WHERE pub_id > '0800'
GROUP BY pub_id
HAVING SUM(advance) > $15000
AND AVG(price) < $20

ORDER BY can be used to order the output of a GROUP BY clause. This example shows using the ORDER BY clause to define the order in which the rows from a GROUP BY clause are returned:

SELECT pub_id, SUM(advance) AS AmountAdvanced,
AVG(price) AS AveragePrice
FROM pubs.dbo.titles
WHERE pub_id > '0800'
AND price >= $5
GROUP BY pub_id
HAVING SUM(advance) > $15000
AND AVG(price) < $20
ORDER BY pub_id DESC

////////////////////////////////////////// 5 /////////This is another page//////////////////////////////////
How to Use GROUP BY and ALL
Transact-SQL provides the ALL keyword in the GROUP BY clause. ALL is meaningful only when the SELECT statement also includes a WHERE clause. If you use ALL, the query results include all groups produced by the GROUP BY clause, even if some of the groups have no rows that meet the search conditions. Without ALL, a SELECT statement that includes GROUP BY does not show groups for which no rows qualify. Here are two examples:

USE pubs
SELECT type, AVG(price)
FROM titles
WHERE royalty = 10
GROUP BY type

Here is the result set: type ------------ -------------------------- business 17.31 popular_comp 20.00 psychology 14.14 trad_cook 17.97 (4 row(s) affected) USE pubs
SELECT type, AVG(price)
FROM titles
WHERE royalty = 10
GROUP BY ALL type
**
Here is the result set: type ------------ -------------------------- business 17.31 mod_cook (null) popular_comp 20.00 psychology 14.14 trad_cook 17.97 UNDECIDED (null) (6 row(s) affected) The first query produces groups only for those books that commanded royalties of 10 percent. Because no modern cookbooks have a royalty of 10 percent, there is no group in the results for the mod_cook type. The second query produces groups for all types, including modern cookbooks and UNDECIDED, even though the modern cookbook group doesn t include any rows that meet the qualification specified in the WHERE clause. The column that holds the aggregate value (the average price) is NULL for groups that lack qualifying rows. See Also ALL SELECT ////////////////////////////////////////// 5 /////////This is another page//////////////////////////////////
How to Use GROUP BY and Null Values
If the grouping column contains a null value, that row becomes a group in the results. If the grouping column contains more than one null value, the null values are put into a single group. This behavior is defined in the SQL-92 standard. The royalty column in the titles table contains some null values, for example: SELECT royalty, AVG(price * 2) AS AveragePrice
FROM pubs.dbo.titles
GROUP BY royalty
**
Here is the result set: royalty AveragePrice ----------- -------------------------- (null) (null) 10 32.89 12 30.94 14 23.90 16 45.90 24 5.98 (6 row(s) affected) This SELECT statement can be easily changed to remove the null values by adding a WHERE clause:

SELECT royalty, AVG(price * 2) AS AveragePrice
FROM pubs.dbo.titles
WHERE royalty IS NOT NULL
GROUP BY royalty

///////////////////////////////////////////////

Related Content

Scroll to Top