Path of the Data Lesson 3 - Transforming Data

Here's how to use process mining transformations for data preparation.

This third in our Path of the Data series (Lesson 1 and Lesson 2) introduces transformations in process mining's data preparation workflow. Transformations enable manipulating extracted data stored in the EMS cloud SQL warehouse before loading it into a data log.

I show you Celonis' SQL editor interface for writing queries and scripts to shape data. Key features covered include selecting database schemas, previewing tables and columns, saving code snippets to avoid accidental overwrites, adding job parameters for elements like start dates, creating tables, and executing test queries.

Watch the video below. You can also follow along with the transcript after the video.

Hi, my name is James Newman. This is the third video in our Path of the Data class on Transformations. Transformations are where you can transform the data in the ETL - it's often creating the event log out of a relational database. In that situation you'd be creating a table for the activity log, adding activities to it, things like that.

I'm going to give you a brief intro into how to create a transformation, what you can do in it, some tips and tricks, and then show you the outcome.

I just finished the execution from the extraction - as I mentioned in video two, I demoed it showing a table with no rows. So I redid it and added about 30 rows to the activity table on our server. Now in this extraction that just finished, you see that we loaded 33 records from the test MySQL database.

So now that data is in the Vertica SQL data storage in the cloud. Now we want to create a transformation - that will let us look at the data, edit it, modify it, create new data.

We're going to add a transformation here called Sandbox for now. It brings you right into a SQL editor where you can run code and you have a schema explorer. There are going to be multiple schemas - there's always the global scope and then for this MySQL database - you'd be able to switch between them. That terminology "schema" - if you're familiar with SQL this is pretty common, but it's important to know that's what it means.

You see the table here "video_test" with our case key, event time, activity, Celonis change date that is added by Celonis, it's when that data was executed or extracted and loaded. That can be handy - we've used that field as the activity timestamp to track changes basically.

Now we can write SQL and execute it - and now we see our event log with case key, event time, activity. Let's change today - this SQL editor is really handy. Some things I really like to do - they have some Vertica SQL database tables that are really handy.

You can see all the tables across schemas - claims data, things like that. Then you can also do a "FROM columns" - so I can view where table_name equals "video_test" and I'll see the four columns pop up as rows here. This will give you the data type too - so if you're pulling data and not sure what it is, this table lets you see "Oh this is a VARCHAR, this is a timestamp," etc. So that can be really useful.

I always save it - one big trick I've learned - I'll often be programming, walk away for hours, and leave this screen open. If it's open for a long time you may get logged out of Celonis but this screen will stay like you left it. So if you write SQL, save it, then exit out of the transformation by clicking the back arrow. Otherwise when you log back in and make a small change and hit Save, it'll overwrite anything done in the meantime in a different tab. So that version control is important.

This transformation also has parameters at the job level for things like start dates. We often use this for a start date so we're only going to pull data from this date forward.

It's enabled again - click the buttons to disable if you don't want to run it. Info button if you have a description for it. If you execute it now, it won't show up here - you can select/unselect it and then it'll run.

A couple pro tips - this would fail here because you need a semicolon - make sure it's proper syntax. That semicolon can cause errors if you forget about a SELECT in there.

So that's transformations - if it's an out of the box connector there'll typically be more, like creating an activity table, adding tables, creating views. You can create and edit tables too - so now if I refreshed, I'd see "test2" the same as "video_test". Pretty much all SQL database functionality is there, some audits, table sizing, etc.

I hope this was helpful - transformations are a great way to view and validate data, doing SELECTS and COUNTS because it's closer than the full load and you should be testing the transformation output too.

I'll see you in the next video where we'll talk about data model loads. Have a great day!

Transformations facilitate cleaning, shaping, and enriching extracted data as part of a flexible cloud-based data integration pipeline to empower process analytics.

Need help! Contact us today!

Don't forget to sign up for our Process Mining Tip of the Week! Click on the image, share your email, and you'll get a helpful tip in your inbox every week!

Path of the Data Lesson 3 - Transforming Data

Recent Posts

Comments