CUME_DIST and PERCENT_RANK new analytical functions in SQL Server 2012
Regular reader of this blog already new that we have already seen new conversion functions and date time functions introduced in SQL Server 2012. Today we are going to look at some of the new analytical function in SQL Server 2012. Following is the list of functions we are going to cover in this article.
- CUME_DIST ()
- PERCENT_RANK ()
Before we look at the example of all these wonderful analytical functions in SQL Server 2012, let us look at the brief definition of these functions.
CUME_DIST : CUME_DIST function returns the cumulative distribution of a value in a group of values. That is, CUME_DIST computes the relative position of a specified value in a group of values. For a row r, assuming ascending ordering, the CUME_DIST of r is the number of rows with values lower than or equal to the value of r, divided by the number of rows evaluated in the partition or query result set. This function gives the percentage of values less than or equal to the current value in the group.
PERCENT_RANK : This function is almost same as CUME_DIST function. The range of values returned by PERCENT_RANK is greater than equal to 0 and less than or equal to 1. The first row in any set has a PERCENT_RANK of 0. This function calculates the percentage of values less than the current value in the group, excluding the highest value, Percent_Rank() for the highest value in a group will always be 1.
After having short definition of both functions, let us examine this fact by following TSQL Script:
CREATE TABLE #AnalyticFunctionCheck ( ID INT IDENTITY(1,1) ,ClientID VARCHAR(10) ,TestName VARCHAR(20) ,TotalSample INT ) INSERT INTO #AnalyticFunctionCheck SELECT 'CHEM02','VOCMS GROUP 1',4 UNION ALL SELECT 'CHEM02','SVOC GROUP 1',6 UNION ALL SELECT 'CHEM02','SVOC STARS',12 UNION ALL SELECT 'CHEM02','PESTICIDE',4 UNION ALL SELECT 'SHAW01','PESTICIDE',10 UNION ALL SELECT 'SHAW01','SVOC STARS',3 UNION ALL SELECT 'SHAW01','SVOC GROUP 1',9 UNION ALL SELECT 'EPAW01','SVOC GROUP 1',14 SELECT * ,CUME_DIST() OVER(PARTITION BY TestName ORDER BY ClientID) AS CumeDist ,PERCENT_RANK() OVER(PARTITION BY TestName ORDER BY ClientID) AS PercentRank FROM #AnalyticFunctionCheck
As soon as your run the above give code, you will get result set something like following screen capture:
Now observe we have first group “Pesticide” test. We have two rows in first group so CumeDist starts with “0.5”. This value comes by following calculation:
Row Number 1 (ID=4) : 1st row in group/total number of rows (1/2)=0.5
Row Number 2 (ID=5) : 2nd row in group/total number of rows (2/2)=1
In short, CUME_DIST=Number of the row we are calculating in the group/total number of row in group.
This is how Cume_Dist column is calculated.
Percent_Rank always starts with 0 so first value in the group will always have 0. Remaining values will be calculated as formula. See we have three rows (3, 4 and 5) in “Group 2”. First value is coming 0 as it is the first value as per our Order By clause. Second and third value calculated as per following formula.
Row Number 3 (ID=2) : (1st row in group-1.0)/(total number of rows-1.0) = (1-1.0)/(3-1) =0
Row Number 4 (ID=8) : (2ND row in group-1.0)/(total number of rows-1.0) = (2-1.0)/(3-1) =0.50
Row Number 5 (ID=7) : (3rd row in group-1.0)/(total number of rows-1.0) = (3-1.0)/(3-1) =1.0
In short, PERCENT_RANK=(Number of the row we are calculating in the group-1.0)/(total number of row in group-1).
Both of these are really wonderful and useful function. I am happy that it is included in SQL Server 2012.
Reference: Ritesh Shah
Note: Microsoft Books online is a default reference of all articles.