Path of the Data Lesson 1 – Celonis Data Connection; a Process Mining Tip

James Newman
Nov 9, 2023
4 min read

Getting to your data quickly is critical to getting the most from your process mining investment. James Newman explains how to set up a data connection, using Celonis as the example, to extract data from an on-premise source system. James provides an overview of the data flow from source to EMS cloud integration and into the process mining studio for analysis. In this video you will lean: ⚡ Step-by-step instructions are provided for setting up the cloud data connection and linking it to the on-premise extractor uplink. ⚡ Key steps include naming the connection, selecting the extractor uplink, specifying the database type, host, port, name, and access credentials. This video focuses solely on configuring the initial data connection to an on-premise database. Stay tuned for more tips and process intelligence how-tos!

Watch the tips here and you can read the transcript afterwards.

Hi my name is James Newman. This is the first in our ProcessMiningIQ series on the Path of the Data – how to extract data from a source system, load data from a flat file, create an event logging, and analyze it.

So we're starting off here in the first video talking about how to set up a connection. But even before that, I'm going to just give a little bit of an overview of what you're seeing here and how it all works together.

I call it the “Path of the Data” and this kind of visual that you see when you go into Data Integrations here – you click Data Integration – you see this nice flow going from your system, the data sources into the EMS Data Integration, and then EMS Data Consumers which is Studio.

I call it Aardvark now for no reason other than it comes up first when you list the data pools.

So you see right now I do have a connection set up already and that's connected to a server that I'm maintaining in my home network. You might say that sounds very unsecure – it's really not because all the communication is going out of my network and then touching the cloud so it's actually quite secure, quite easy to maintain, quite easy to set up.

I'm in my AppDev environment here but I also have another tab open I'm going to switch over, which is the documentation on how to set up an on–premise extractor – that's basically what I've done here. And so what you use is a Java file that has the plugins to a variety of different databases. In this situation it's MySQL. And you set the file up to run and then it just sits on the server and will occasionally ping the EMS system.

Here are the kind of the – I'm not going to walk through all of it – it's just important to know that it's Java, all the docs are here, I followed it and it works pretty well. So I could go over to my terminal and show you, you know on my server this file is running as a service and it's going to just keep running there until I turn off my computer. In which case you could enable it to run all the time.

These are the instructions – I'm going to put this link in the video description and it'll walk you through everything. So once you're done with that you're going to get to the Data Connection here and now we're going to talk about how to set up the data connection.

You click Add Data Connection and it's going to pop up the screen that looks exactly like this – there's really no difference except mine's filled out. So I filled in the name "My SQL Connection," I selected the MySQL uplink, so the uplink is what has the IP address of my home network.

Together the uplink connects – it has the information about the MySQL server – and Database Type MySQL Standard Connection. So it gets a little interesting – it said the Host here is Localhost because that's what it's telling the extractor JAR file server program that's running on my server machine – that "Oh you want to go to the host Localhost."

So MySQL server is running on the same machine the extractor is running on so I can just say Localhost. If it was a machine that was co–located but on a different IP address you would put that hostname in there.

And Report MySQL 3306 – and then the name of the database in the SQL Server – mine is just "test." Then you have to create a user that has the proper permissions to get to that database and the user password.

You do have some advanced settings such as maximal number of parallel extractions, timeout, synonymization algorithm – things like that. Don't usually touch these – you might want to touch the parallel extractions if things are getting too long or too slow.

I had another couple extractions to that so you test it – it'll ping says, yep it is good. So how that works is when I click that button I'm just changing a flag on the EMS server in AWS, Azure, wherever it happens to be in the cloud. And the next time that my server pings the cloud it's going to check and say "Oh yep we want to test that connection" and it's good.

If it sits there for a while and it's not good – so if I go turn off my server – that Test Connection will time out after a certain amount of time, again it's about 60 seconds, and tell you "Oh no we didn't, it's not reachable." It's not really not reachable – what it's reading is "Oh yeah we're not getting a ping from it, it's not really alive." So that's where that lies.

So then once you get that you save it – don't want to save password – it's been saved successfully and so you can go and then create data job. You can create an extraction – in the next video I'll talk about that. So this was just about setting up the data connection and kind of making that connection so you can load the data.

Hope this was helpful – please comment, ask any questions about how to use the Data Connections feature but hope this was helpful. Have a great day!

Path of the Data Lesson 1 – Celonis Data Connection; a Process Mining Tip

Recent Posts

Comments