How to Use Cumulative Aggregation in PostgreSQL: A Step-by-Step Retail Use Case

Learn Cumulative Aggregation in PostgreSQL: Retail Application Example

·

5 min read

Cumulative Aggregation in PostgreSQL

Cumulative aggregation, also referred to as running totals or cumulative sums, allows you to calculate progressive sums or totals, which update as each new row is processed. PostgreSQL supports cumulative aggregation using window functions, such as SUM() combined with the OVER() clause.

This feature is particularly useful in scenarios like retail analytics, where cumulative metrics such as daily sales, monthly revenue, or running inventory levels need to be tracked and analyzed over time.


Syntax for Cumulative Aggregation

SELECT 
    column, 
    SUM(column) OVER (ORDER BY column) AS cumulative_sum
FROM 
    table;
  • SUM(): An aggregation function that sums up the values.

  • OVER(): Defines the window over which the function is computed. The ORDER BY clause specifies the order of the rows for which the cumulative sum will be calculated.


Retail Application Example: Cumulative Sales Tracking

Let’s say you are developing a retail sales application for a company. You have a table daily_sales that tracks sales data for various products each day.

Table: daily_sales

dateproduct_idsales_amount
2024-09-01101500
2024-09-02101700
2024-09-03101400
2024-09-04101600
2024-09-051011000
2024-09-01102300
2024-09-02102500
2024-09-03102700
2024-09-04102200
2024-09-05102100

In this table:

  • date: The date of the sales transaction.

  • product_id: The unique identifier for a product.

  • sales_amount: The total sales amount for that day.

Goal: Cumulative Sales for Each Product

You want to calculate the cumulative sales over time for each product. For example, the cumulative sales for product 101 should update day by day, accumulating the previous sales totals.

PostgreSQL Query for Cumulative Sales

SELECT 
    date,
    product_id,
    sales_amount,
    SUM(sales_amount) OVER (PARTITION BY product_id ORDER BY date) AS cumulative_sales
FROM 
    daily_sales
ORDER BY 
    product_id, date;

Explanation:

  • SUM(sales_amount): The aggregation function to calculate the cumulative sum of sales.

  • OVER (PARTITION BY product_id ORDER BY date):

    • PARTITION BY product_id ensures that the cumulative sum is calculated separately for each product.

    • ORDER BY date ensures that the sum is accumulated in the correct date order.

  • ORDER BY product_id, date: Orders the results first by product, then by date.


Result:

dateproduct_idsales_amountcumulative_sales
2024-09-01101500500
2024-09-021017001200
2024-09-031014001600
2024-09-041016002200
2024-09-0510110003200
2024-09-01102300300
2024-09-02102500800
2024-09-031027001500
2024-09-041022001700
2024-09-051021001800

In the above result:

  • For product 101, the cumulative sales grow progressively from 500 on September 1st to 3200 on September 5th.

  • For product 102, the cumulative sales start at 300 and grow to 1800 by the same period.


Use Cases for Cumulative Aggregation in Retail

  1. Daily/Monthly Sales Trends:

    • Track how much sales have accumulated over time for specific products or across the entire store. This helps identify trends, spot seasonal peaks, and monitor sales growth or decline.

Example: If a product's cumulative sales suddenly slow down after a marketing campaign, it could indicate the need for promotional adjustments.

  1. Customer Purchase Patterns:

    • Track customer purchases over time to analyze cumulative spending, which can be used for loyalty programs and promotions.

Example: Cumulative spending over time might help segment high-value customers and tailor promotions accordingly.

  1. Inventory Management:

    • Use cumulative aggregation to monitor how much stock has been sold over time, helping ensure proper inventory replenishment and avoid stockouts.

Example: Retailers can forecast product restocking based on cumulative sales patterns over the past months.

  1. Revenue Forecasting:

    • Cumulative revenue figures can be used to forecast expected sales and budget for future operations.

Additional Features in PostgreSQL for Cumulative Aggregation

  1. ROW_NUMBER():

    • You can combine cumulative aggregation with the ROW_NUMBER() function to track the nth transaction per product or customer.
    SELECT 
        date,
        product_id,
        sales_amount,
        SUM(sales_amount) OVER (PARTITION BY product_id ORDER BY date) AS cumulative_sales,
        ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY date) AS transaction_number
    FROM 
        daily_sales;

This gives the cumulative sum and adds a transaction number for each sale, helping track individual sales across time.

  1. RANK():

    • The RANK() function can be used in combination with cumulative aggregation to rank products based on cumulative sales.
  2. RANGE / ROWS BETWEEN:

    • Fine-tune how many rows should be included in the cumulative aggregation by defining specific ranges or row numbers using ROWS BETWEEN in the OVER() clause.

Performance Considerations

  • Indexing: Ensure that the columns used in the ORDER BY clause are indexed to improve query performance.

  • Partitions: The PARTITION BY clause helps limit the scope of the cumulative sum, which makes it more efficient for large datasets by applying the calculation to partitions rather than the entire dataset.


Conclusion

Cumulative aggregation in PostgreSQL is a powerful tool for tracking progressive metrics such as sales, revenue, and customer purchases in retail applications. It provides a simple yet efficient way to analyze running totals over time, enabling businesses to make informed decisions about inventory, marketing, and operations. By utilizing window functions like SUM() with the OVER() clause, retailers can unlock valuable insights from their data to improve performance and strategy.