15. Window Functions
Powering Advanced Analytics in SQL
Window Functions perform calculations across a set of table rows that are somehow related to the current row, but crucially, they do not collapse the rows. This allows for sophisticated analytical tasks like ranking, calculating running totals, or comparing a row's value to a previous or next row, all within the same result set.
1. Ranking Functions: ROW_NUMBER(), RANK(), DENSE_RANK()
These functions assign a rank or sequential number to each row within its partition, based on an ORDER BY clause within the OVER() clause.
Understanding the Differences
ROW_NUMBER(): Assigns a unique, sequential integer to each row within its partition, starting from 1. If values are tied, they get different numbers.RANK(): Assigns the same rank to tied values. The next rank after a tie is skipped (e.g., if two rows are rank 1, the next rank is 3).DENSE_RANK(): Assigns the same rank to tied values. The next rank after a tie is consecutive (e.g., if two rows are rank 1, the next rank is 2).
Example: Ranking People by Age
People Table
| PersonID | Age |
| 1 | 20 |
| 2 | 22 |
| 3 | 23 |
| 4 | 24 |
| 5 | 24 |
| 6 | 31 |
People;
Results:
| PersonID | Age | RowNum | RankNum | DenseRankNum |
| 6 | 31 | 1 | 1 | 1 |
| 5 | 24 | 2 | 2 | 2 |
| 4 | 24 | 3 | 2 | 2 |
| 3 | 23 | 4 | 4 | 3 |
| 2 | 22 | 5 | 5 | 4 |
| 1 | 20 | 6 | 6 | 5 |
2. Lag/Lead Functions: Comparing Rows (LAG(), LEAD())
These functions allow you to access data from a previous or next row within the same result set, defined by the ORDER BY and PARTITION BY clauses.
Understanding the Differences
LAG(column, offset, default_value): Retrieves the value ofcolumnfrom a row that isoffsetrows before the current row within its partition.LEAD(column, offset, default_value): Retrieves the value ofcolumnfrom a row that isoffsetrows after the current row within its partition.
Example: Tracking Increasing Temperatures per City
WeatherData Table
| City | Temp (C) | Weather |
| Boston | 15 | Humid |
| Boston | 18 | Sunny |
| Boston | 20 | Sunny |
| NY | 12 | Windy |
| NY | 15 | Humid |
| NY | 17 | Sunny |
| Chicago | 10 | Windy |
| Chicago | 12 | Humid |
| Chicago | 14 | Sunny |
Goal: For each city, identify if the current temperature is higher than the previous temperature recorded in that city.
Here, PARTITION BY City means the LAG function restarts its count for each city.
Results (Ordered by City, then Temp for clarity):
| City | CurrentTemp | PreviousTemp | IsTempIncreasingFromPrevious |
| Boston | 15 | NULL | No |
| Boston | 18 | 15 | Yes |
| Boston | 20 | 18 | Yes |
| Chicago | 10 | NULL | No |
| Chicago | 12 | 10 | Yes |
| Chicago | 14 | 12 | Yes |
| NY | 12 | NULL | No |
| NY | 15 | 12 | Yes |
| NY | 17 | 15 | Yes |
The first entry for each city has
NULLforPreviousTempbecause there's no preceding row in that partition.
Other Essential Window Functions
The following functions are often used as Aggregate Window Functions because they perform a standard aggregation (SUM, AVG, MAX, MIN) over a specified window without collapsing the rows.
3. Running Total (Cumulative Sum)
The Running Total shows the sum of a value up to the current row, often used to track progress over time. This requires an unbounded preceding window frame.
Goal: Calculate the cumulative total sales as each transaction occurs.
4. Moving Average (Rolling Average)
The Moving Average (or rolling average) smooths out short-term fluctuations by averaging a set number of preceding and following rows. This helps reveal underlying trends.
Goal: Calculate the average temperature over the current day and the preceding 2 days.
5. Calculating Percentage of Total (Partitioned Aggregation)
You can use an aggregate function over the entire dataset (or partition) to calculate how the current row relates to the whole.
Goal: Find what percentage of each city's total sales came from a specific product.
In this example, SUM(TotalSales) OVER (PARTITION BY City) calculates the total sales for that city once, and that single value is applied to every row belonging to that city, enabling the percentage calculation.
Window functions transform complex analytical problems into elegant and efficient SQL queries, enabling powerful insights directly from your database.
| Concept | Purpose | Analogy |
OVER() | Defines the window (the set of rows) the function will operate on. | The entire frame of the window. |
PARTITION BY | Groups the rows into separate sets, causing the window function to restart for each group. | The wall separating rooms; calculations restart in each room. |
ORDER BY | Orders the rows within each partition, which is critical for ranking and sequential functions (LAG, LEAD, cumulative sums). | The sequence of files on the desk within one room. |
ROWS BETWEEN | Defines the window frame (the specific subset of rows) relative to the current row, necessary for running totals and moving averages. | The specific number of files you're looking at right now. |
SQL Window Functions: Summary Table
| Function Category | Function | Purpose | Example OVER() Clause |
| Ranking | ROW_NUMBER() | Assigns a unique, sequential integer to each row. Ignores ties. | OVER (PARTITION BY City ORDER BY Temp DESC) |
RANK() | Assigns the same rank to tied values and skips the next rank number. | OVER (ORDER BY Salary DESC) | |
DENSE_RANK() | Assigns the same rank to tied values but uses a consecutive number for the next rank. | OVER (ORDER BY Age DESC) | |
| --- | --- | --- | --- |
| Navigational | LAG() | Accesses the value from a row a specified number of steps before the current row. | OVER (PARTITION BY Product ORDER BY Date) |
LEAD() | Accesses the value from a row a specified number of steps after the current row. | OVER (PARTITION BY City ORDER BY Time) | |
| --- | --- | --- | --- |
| Aggregate | SUM() (as a Window Function) | Calculates a running total or a cumulative sum over a defined range. | OVER (ORDER BY Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) |
AVG() (as a Window Function) | Calculates a moving average over a defined set of rows. | OVER (ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) | |
COUNT() (as a Window Function) | Calculates the total count within the defined partition (e.g., total employees in a department). | OVER (PARTITION BY DepartmentID) |
Comments
Post a Comment