Extracting Power BI Import Mode Job Graph Data To A Table

In my last post but one I showed how you could create a DGML file that contains detailed information on what happens during a Power BI Import mode refresh – the “job graph” – and visualise that in Visual Studio or VS Code, to help with performance tuning. In my last post, I explained the concepts of blocking and waiting which are key to understanding this data. In this post I’ll share and describe a Dataflow template that extracts the data contained in the DGML file to a table for other types of analysis – which turns out to be quite simple, because DGML is an XML-based format which is easy to work with in Power Query and Dataflows.

You can download the .pqt template file here (for more information on how to use templates see here). To use it, create a new Dataflow Gen2 and import the template file. You’ll see the following queries:

Next, change the values of the three parameters to contain the IDs of the refresh whose DGML file you want to load (the notebook in this post that creates the DGML file uses the refresh ID as the file’s name) and the workspace and lakehouse where that file is stored. Finally, set output destinations on either the RefreshJobs and/or RefreshJobsPivoted queries. The RefreshJobs query returns a table with one row per refresh job:

This gives you all the data in the DGML file that was visible using a DGML viewer but in table form. For each job you get the CreatedAt, RunnableAt, StartedAt and FinishedAt datetimes, the Blocked, Wait and Running durations, the Slot ID, the CPU used and whether the job is on the critical path.

The RefreshJobsPivoted query gives you exactly the same data but instead of giving you one row per job, you get three: one row for the blocked, waiting and running phases of each job, with the start and end times for each phase and the duration. This may be a more convenient format for visualisation and analysis:

The Links query gives you the dependencies between the jobs:

Having loaded all this data into a Lakehouse you can now build a report on it. As I said, I’m sure there’s a way of visualising all of this data in a way that shows all the durations and dependencies, but even a basic table report on the RefreshJobs table like this is really useful because it shows you which jobs were blocked, which ones had to wait, and which ones took a long time to run:

In this example (the Dependencies example from my previous post, but where the refresh has maxParallelism set to 1, so there is blocking as well as waiting) you can see that refreshing table X took 60 seconds, that the refresh for table Y had to wait for 60 seconds before it could start and took 10 seconds, and that refresh for table XY was blocked for 70 seconds before it could start. So, if you wanted to make this refresh run faster, you would want to understand why table X was so slow and also look at increasing the amount of parallelism so Y did not have to wait.

One thought on “Extracting Power BI Import Mode Job Graph Data To A Table

Leave a Reply