15. Window Functions

Powering Advanced Analytics in SQL

 Window Functions  perform calculations across a set of table rows that are somehow related to the current row, but crucially, they do not collapse the rows. This allows for sophisticated analytical tasks like ranking, calculating running totals, or comparing a row's value to a previous or next row, all within the same result set.

1. Ranking Functions: ROW_NUMBER(), RANK(), DENSE_RANK()

These functions assign a rank or sequential number to each row within its partition, based on an ORDER BY clause within the OVER() clause.

Understanding the Differences

  • ROW_NUMBER(): Assigns a unique, sequential integer to each row within its partition, starting from 1. If values are tied, they get different numbers.

  • RANK(): Assigns the same rank to tied values. The next rank after a tie is skipped (e.g., if two rows are rank 1, the next rank is 3).

  • DENSE_RANK(): Assigns the same rank to tied values. The next rank after a tie is consecutive (e.g., if two rows are rank 1, the next rank is 2).

Example: Ranking People by Age

People Table 

PersonIDAge
120
222
323
424
524
631

SELECT 
PersonID,
Age,
ROW_NUMBER() OVER (OEDER BY Age DESC) AS RowNum,
RANK() OVER (OEDER BY Age DESC) AS Rank,
DENSE_RANK() OVER (OEDER BY Age DESC) AS Dense_Rank
FROM People;

    People;

Results:

PersonIDAgeRowNumRankNumDenseRankNum
631111
524222
424322
323443
222554
120665

2. Lag/Lead Functions: Comparing Rows (LAG(), LEAD())

These functions allow you to access data from a previous or next row within the same result set, defined by the ORDER BY and PARTITION BY clauses.

Understanding the Differences

  • LAG(column, offset, default_value): Retrieves the value of column from a row that is offset rows before the current row within its partition.

  • LEAD(column, offset, default_value): Retrieves the value of column from a row that is offset rows after the current row within its partition.

Example: Tracking Increasing Temperatures per City

WeatherData Table 

CityTemp (C)Weather
Boston15Humid
Boston18Sunny
Boston20Sunny
NY12Windy
NY15Humid
NY17Sunny
Chicago10Windy
Chicago12Humid
Chicago14Sunny

Goal: For each city, identify if the current temperature is higher than the previous temperature recorded in that city.

Here, PARTITION BY City means the LAG function restarts its count for each city.


SELECT 
City,
Temperature AS CurrentTemp,
LEAD(Temp,1) OVER (PARTITION BY City ORDER BY Temp) AS PreviousTemp,
CASE
        WHEN Temp > LAG(Temp, 1) OVER 
                   (PARTITION BY City ORDER BY Temp) THEN 'Yes'
        ELSE 'No'
    END AS IsTempIncreasingFromPrevious
FROM
    WeatherData;

Results (Ordered by City, then Temp for clarity):

CityCurrentTempPreviousTempIsTempIncreasingFromPrevious
Boston15NULLNo
Boston1815Yes
Boston2018Yes
Chicago10NULLNo
Chicago1210Yes
Chicago1412Yes
NY12NULLNo
NY1512Yes
NY1715Yes
  • The first entry for each city has NULL for PreviousTemp because there's no preceding row in that partition.

Other Essential Window Functions

The following functions are often used as Aggregate Window Functions because they perform a standard aggregation (SUM, AVG, MAX, MIN) over a specified window without collapsing the rows.

3. Running Total (Cumulative Sum)

The Running Total shows the sum of a value up to the current row, often used to track progress over time. This requires an unbounded preceding window frame.

Goal: Calculate the cumulative total sales as each transaction occurs.


4. Moving Average (Rolling Average)

The Moving Average (or rolling average) smooths out short-term fluctuations by averaging a set number of preceding and following rows. This helps reveal underlying trends.

Goal: Calculate the average temperature over the current day and the preceding 2 days.


5. Calculating Percentage of Total (Partitioned Aggregation)

You can use an aggregate function over the entire dataset (or partition) to calculate how the current row relates to the whole.

Goal: Find what percentage of each city's total sales came from a specific product.

SELECT 
 City,
 ProductCategory,
 TotalSales,
 SUM(TotalSales) OVER (PARTITION BY City) AS CityTotalSales,
    
(TotalSales * 100.0 / SUM(TotalSales) OVER 
(PARTITION BY City)) AS PercentOfCitySales

FROM
    CitySalesByCategory;


In this example, SUM(TotalSales) OVER (PARTITION BY City) calculates the total sales for that city once, and that single value is applied to every row belonging to that city, enabling the percentage calculation.


Window functions transform complex analytical problems into elegant and efficient SQL queries, enabling powerful insights directly from your database. 


ConceptPurposeAnalogy
OVER()Defines the window (the set of rows) the function will operate on.The entire frame of the window.
PARTITION BYGroups the rows into separate sets, causing the window function to restart for each group.The wall separating rooms; calculations restart in each room.
ORDER BYOrders the rows within each partition, which is critical for ranking and sequential functions (LAG, LEAD, cumulative sums).The sequence of files on the desk within one room.
ROWS BETWEENDefines the window frame (the specific subset of rows) relative to the current row, necessary for running totals and moving averages.The specific number of files you're looking at right now.

SQL Window Functions: Summary Table

Function CategoryFunctionPurposeExample OVER() Clause
RankingROW_NUMBER()Assigns a unique, sequential integer to each row. Ignores ties.OVER (PARTITION BY City ORDER BY Temp DESC)
RANK()Assigns the same rank to tied values and skips the next rank number.OVER (ORDER BY Salary DESC)
DENSE_RANK()Assigns the same rank to tied values but uses a consecutive number for the next rank.OVER (ORDER BY Age DESC)
------------
NavigationalLAG()Accesses the value from a row a specified number of steps before the current row.OVER (PARTITION BY Product ORDER BY Date)
LEAD()Accesses the value from a row a specified number of steps after the current row.OVER (PARTITION BY City ORDER BY Time)
------------
AggregateSUM() (as a Window Function)Calculates a running total or a cumulative sum over a defined range.OVER (ORDER BY Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AVG() (as a Window Function)Calculates a moving average over a defined set of rows.OVER (ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
COUNT() (as a Window Function)Calculates the total count within the defined partition (e.g., total employees in a department).OVER (PARTITION BY DepartmentID)

Comments

Popular Posts