One of the common problems we find in our process mining practice, especially in order management, is handling a host of different string descriptions. For example, an order could consist of 10 lines, each with its own description.
Distinct String Aggregations
In Celonis, I often want to show the count of order types for an order. For this I would typically just list out the order type at the line level and count the distinct order header IDs.
But what if I want it on the header level?
Then I'd have to do a pull up. I can easily use PU_STRING_AGG. However, it will list out all of the different types from each line. Even if there are duplicates!
Therefore, you could get something like the following:
BOOK,MOVIE,BOOK,BOOK,PLAY,MOVIE
When what I really wanted to show was:
BOOK,MOVIE,PLAY
A nifty bit of PQL to create a distinct PU_STRING_AGG is as follows:
PU_STRING_AGG(<table>, <field>, ',' , INDEX_ORDER(<field>, PARTITION BY (<foreign keys>, <field>)) = 1)
To explain, the code will aggregate each of the fields to the table level and keep it distinct. It does this by using a conditional as a fourth argument to the pull up function. This conditional uses the INDEX_ORDER function to check if the field was the first iteration of that field based on the partition.
For the partition, we use the foreign keys between the two tables the pull up is traversing. Additionally, we want to use the field because that is the actual field that we are partitioning by. We need the foreign keys so that the pullup returns properly.
This bit of PQL has helped me significantly in presenting data properly. No one wants to see a string list with duplicates. It doesn't present well. You are ready to create your own distinct string aggregations.
Hope this example from ProcessMiningIQ is helpful in your analysis journey!
Commenti