Running Sum in Tableau Prep

Tableau Prep doesn’t (as of May 2020) do all the table calculations that Tableau Desktop can do, but we can teach it to! As part of supporting Tableau’s COVID-19 response Anya A’Hearn and I were using Tableau Prep to work with a new data set that had daily counts but not the cumulative totals (i.e. running sums) that some users require, and in looking around the internets there wasn’t a purpose-built how-to for running sums in Tableau Prep so here’s one!

Using running sums or cumulative totals: A COVID-19 caution

If you’re looking to show trends of a number then running sums are not particularly useful because the running sum is accumulating…with a measure like “# of tests” the running sum is always going to stay level or go up (the only reason to go down would be if there was mis-reported data). Here’s an example from a US COVID-19 press conference in early April 2020. On the left is the displayed chart of cumulative daily tests that looks great, but the actual daily tests tells a very different picture.

Screen Shot 2020-05-05 at 7.31.14 PM.png

Steve Wexler did a discussion of this at a recent Chart Chat: https://youtu.be/mz-qmrOyUKA?t=816


Ok, that’s great, I really do want a running sum!

In this post we’ll go through an explainer on how building running sums in Prep at present is similar to-but-different from how we do it in a tool like MS Excel with a simple 4 record data set, then extend that to doing a running sum with a more complex data set.

Here we’ve got a simple four record data set with a value for each day that we want to build a running sum on:

Screen Shot 2020-05-05 at 7.38.32 PM.png

In Excel doing a manual running sum then we’d add the current day’s value to the prior running sum value, like this where the B# cell references are the current day’s value and the C# cell references are the running sum:

Screen Shot 2020-05-05 at 7.39.30 PM.png

If we expand the C# cell references, though, what we can see is that each successive cell is summing up all the prior cells:

Screen Shot 2020-05-05 at 7.40.43 PM.png


And this points to how we’re going to build the running sum in Tableau Prep: we can use a Join step in Prep to create an intermediate data source such that for each row of data it has the values from all prior rows, just like for 3/1/20 in the screenshot above the running sum is adding up values B2+B3+B4.  Here’s what the intermediate data set will look like:

Screen Shot 2020-05-05 at 7.41.34 PM.png

Then to build the final data set we’ll use an Aggregate step to sum up all of the data for each date and generate the running sum:

Screen Shot 2020-05-05 at 7.42.14 PM.png

The basic principle here is that although Prep doesn’t do row-by-row operations (except for the ranking functions, as of May 2020) what we can do is self-join the data to give each original row of data additional row(s) of data to simulate row-by-row operations using aggregations and/or calculated fields. With this principle in mind we can do lookups, difference from prior, moving averages, and so on.

Here’s are the core steps for building this in Tableau Prep:

  1. Connect to your data.

  2. If need be aggregate your data to the desired level. For example in Superstore Sales that is at the level of individual transaction if we want to do a monthly running sum then we’d need to aggregate to the level of Month first.

  3. Add a Clean step to prepare for the self-join.

  4. Add a Join step as a non-equi-join with any grouping/partitioning fields in the running sum.

  5. Add an Aggregate step with any grouping/partitioning fields in the group by to create the running sum.

  6. If necessary add a Clean step to validate.

Simple Data Example

Walking through the steps from above, here’s the entire flow:

Screen Shot 2020-05-05 at 7.43.48 PM.png


1. Connect to the data and add an Input step.

2. No aggregation is necessary in this case prior to the running sum, so we don’t need an intermediate Aggregate step.

3. Add a Clean step. In the Clean step change the name of the Value column to Running Sum. The Running Sum field doesn’t have the exact value of the running sum yet, it will in a couple of steps.

Screen Shot 2020-05-05 at 7.45.08 PM.png

4. Add a Join step by dragging from the anchor on the Clean step back to the Input step. This makes Input step (our original data) the left side of the join and Clean step (the prior data that we’re joining) the right.

a. Configure the Join to be on Date >= Date, so that way we keep all dates from the left side (our original data) and only rows with dates on or before the date from the right side. We can get a sense of this by the nice “stair-step” effect on the Date field.

Screen Shot 2020-05-05 at 7.48.17 PM.png

5. After the Join step add an Aggregate step that groups on the Date and (original) Value and sums up the Running Sum. The output is now a running sum.

Screen Shot 2020-05-05 at 7.49.21 PM.png

6.In this case the data is small & simple enough that we can validate all the results in the Aggregate step so we don’t need a Clean step.

That’s how to build out a running sum! Here’s a link to the running sum simple data.tflx packaged flow..


Grouping or partitioning the running sum

A slightly more complex use case is when we need to group or partition the running sum, for example the running sum of daily data by state. Here’s a sample data set with 10 days’ worth of daily data from three states:

Screen Shot 2020-05-05 at 7.51.52 PM.png


One thing we can see in the data (and this screenshot in Tableau Prep) is that the first date for each state isn’t the same. 

Screen Shot 2020-05-05 at 7.53.53 PM.png

This isn’t an issue for this use case, but it is something to keep an eye on in the validation. Using the above method we have to make sure that steps 4 (Join) and 5 (Aggregate to compute the running sum) include the grouping dimension(s), so State in this case, and we’ll use step 6 (Clean to validate).

For the Join step (step #4 in the above method) we need to add State to the join clauses:

Screen Shot 2020-05-05 at 7.55.26 PM.png

Note that the “stair step” effect on Date is mixed because different states were reporting on different dates, we can use the Prep’s brushing capabilities to review each state:

Screen Shot 2020-05-05 at 7.55.52 PM.png

Then in the Aggregate step (step #5 in the above method) we also include the State in the group by:

Screen Shot 2020-05-05 at 7.56.36 PM.png

Personally I find that it’s little harder to validate in the Aggregate step based on Prep’s display, so I’ll add a Clean step (step #6 in the above method) and do the validation there. Here I’ve selected Idaho and just a few dates so I can see that the running sum is accurately computed:

Screen Shot 2020-05-05 at 7.58.08 PM.png

That’s grouping a running sum..here’s a link to the running sum daily data.tflx packaged flow.


Running sum on aggregated transactional data

In this final example we’ll use Tableau’s Superstore data that is at the level of transaction to create a monthly running sum of Sales by Segment. Recognizing that the data that we have isn’t exactly the data that we need (for our computation) is a key skill for working with data & Tableau. Our computation requires the monthly sales by segment (to calculate the running sum) and since the data has daily transactions we’ve got a lot more rows than what we ned. Usually when we want to go from more rows (the transactions) to fewer rows (monthly sales by segment) we want to use an aggregation and that’s what we’ll do in this case.

Therefore using the above method for step 2 (aggregate if need be) we’ll use a couple of steps: a Clean step to build a Month field using DATE(DATETRUNC(‘month’, [Order Date])) and then aggregate the data to the level of Month & Segment:

Screen Shot 2020-05-05 at 8.05.15 PM.png


Then from there we can use the same steps as previously demonstrated 3. Clean to prepare for join, 4. Join, 5. Aggregate to compute the running sum, and 6. Clean to validate the running sum:

Screen Shot 2020-05-05 at 8.06.24 PM.png


And that’s how to do a running sum of aggregated data! Here’s a link to the running sum superstore.tflx packaged flow.


Conclusion

Running sums and other row-by-row computations are possible in Prep with a little extra work and the packaged flows for each of the scenarios (simple running sum across all data, grouping, and then aggregating prior to running sum) are linked to in each example above. If you’d like Prep to have native support for running sums then please vote up the Add table calculations... feature request on the Tableau Community Forums (it’s a pretty popular one).