Photo by Nadine Venter on Unsplash
How to Use Cumulative Aggregation in PostgreSQL: A Step-by-Step Retail Use Case
Learn Cumulative Aggregation in PostgreSQL: Retail Application Example
Cumulative Aggregation in PostgreSQL
Cumulative aggregation, also referred to as running totals or cumulative sums, allows you to calculate progressive sums or totals, which update as each new row is processed. PostgreSQL supports cumulative aggregation using window functions, such as SUM()
combined with the OVER()
clause.
This feature is particularly useful in scenarios like retail analytics, where cumulative metrics such as daily sales, monthly revenue, or running inventory levels need to be tracked and analyzed over time.
Syntax for Cumulative Aggregation
SELECT
column,
SUM(column) OVER (ORDER BY column) AS cumulative_sum
FROM
table;
SUM()
: An aggregation function that sums up the values.OVER()
: Defines the window over which the function is computed. TheORDER BY
clause specifies the order of the rows for which the cumulative sum will be calculated.
Retail Application Example: Cumulative Sales Tracking
Let’s say you are developing a retail sales application for a company. You have a table daily_sales
that tracks sales data for various products each day.
Table: daily_sales
date | product_id | sales_amount |
2024-09-01 | 101 | 500 |
2024-09-02 | 101 | 700 |
2024-09-03 | 101 | 400 |
2024-09-04 | 101 | 600 |
2024-09-05 | 101 | 1000 |
2024-09-01 | 102 | 300 |
2024-09-02 | 102 | 500 |
2024-09-03 | 102 | 700 |
2024-09-04 | 102 | 200 |
2024-09-05 | 102 | 100 |
In this table:
date
: The date of the sales transaction.product_id
: The unique identifier for a product.sales_amount
: The total sales amount for that day.
Goal: Cumulative Sales for Each Product
You want to calculate the cumulative sales over time for each product. For example, the cumulative sales for product 101
should update day by day, accumulating the previous sales totals.
PostgreSQL Query for Cumulative Sales
SELECT
date,
product_id,
sales_amount,
SUM(sales_amount) OVER (PARTITION BY product_id ORDER BY date) AS cumulative_sales
FROM
daily_sales
ORDER BY
product_id, date;
Explanation:
SUM(sales_amount)
: The aggregation function to calculate the cumulative sum of sales.OVER (PARTITION BY product_id ORDER BY date)
:PARTITION BY product_id ensures that the cumulative sum is calculated separately for each product.
ORDER BY date ensures that the sum is accumulated in the correct date order.
ORDER BY product_id, date: Orders the results first by product, then by date.
Result:
date | product_id | sales_amount | cumulative_sales |
2024-09-01 | 101 | 500 | 500 |
2024-09-02 | 101 | 700 | 1200 |
2024-09-03 | 101 | 400 | 1600 |
2024-09-04 | 101 | 600 | 2200 |
2024-09-05 | 101 | 1000 | 3200 |
2024-09-01 | 102 | 300 | 300 |
2024-09-02 | 102 | 500 | 800 |
2024-09-03 | 102 | 700 | 1500 |
2024-09-04 | 102 | 200 | 1700 |
2024-09-05 | 102 | 100 | 1800 |
In the above result:
For product 101, the cumulative sales grow progressively from 500 on September 1st to 3200 on September 5th.
For product 102, the cumulative sales start at 300 and grow to 1800 by the same period.
Use Cases for Cumulative Aggregation in Retail
Daily/Monthly Sales Trends:
- Track how much sales have accumulated over time for specific products or across the entire store. This helps identify trends, spot seasonal peaks, and monitor sales growth or decline.
Example: If a product's cumulative sales suddenly slow down after a marketing campaign, it could indicate the need for promotional adjustments.
Customer Purchase Patterns:
- Track customer purchases over time to analyze cumulative spending, which can be used for loyalty programs and promotions.
Example: Cumulative spending over time might help segment high-value customers and tailor promotions accordingly.
Inventory Management:
- Use cumulative aggregation to monitor how much stock has been sold over time, helping ensure proper inventory replenishment and avoid stockouts.
Example: Retailers can forecast product restocking based on cumulative sales patterns over the past months.
Revenue Forecasting:
- Cumulative revenue figures can be used to forecast expected sales and budget for future operations.
Additional Features in PostgreSQL for Cumulative Aggregation
ROW_NUMBER():
- You can combine cumulative aggregation with the
ROW_NUMBER()
function to track the nth transaction per product or customer.
- You can combine cumulative aggregation with the
SELECT
date,
product_id,
sales_amount,
SUM(sales_amount) OVER (PARTITION BY product_id ORDER BY date) AS cumulative_sales,
ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY date) AS transaction_number
FROM
daily_sales;
This gives the cumulative sum and adds a transaction number for each sale, helping track individual sales across time.
RANK():
- The
RANK()
function can be used in combination with cumulative aggregation to rank products based on cumulative sales.
- The
RANGE / ROWS BETWEEN:
- Fine-tune how many rows should be included in the cumulative aggregation by defining specific ranges or row numbers using
ROWS BETWEEN
in theOVER()
clause.
- Fine-tune how many rows should be included in the cumulative aggregation by defining specific ranges or row numbers using
Performance Considerations
Indexing: Ensure that the columns used in the
ORDER BY
clause are indexed to improve query performance.Partitions: The
PARTITION BY
clause helps limit the scope of the cumulative sum, which makes it more efficient for large datasets by applying the calculation to partitions rather than the entire dataset.
Conclusion
Cumulative aggregation in PostgreSQL is a powerful tool for tracking progressive metrics such as sales, revenue, and customer purchases in retail applications. It provides a simple yet efficient way to analyze running totals over time, enabling businesses to make informed decisions about inventory, marketing, and operations. By utilizing window functions like SUM()
with the OVER()
clause, retailers can unlock valuable insights from their data to improve performance and strategy.