Gateways And Measuring Power Query CPU Usage During Power BI Dataset Refresh

After last week’s post on measuring Power Query CPU usage during dataset refresh, someone asked an obvious question that I should have addressed: does using a gateway change anything? After all, if you’re using a gateway to connect to an on-premises data source then all the Power Query queries transforming the data from that source will be executed on the gateway machine and not in the Power BI Service.

Let’s do a quick test to find out. I couldn’t use the same Power Query query I used in last week’s post (it turns out you can’t force the use of a gateway when there isn’t an external data source) so instead I used another dataset that connects to a large CSV stored in ADLSgen2 storage and does a group by operation – something which is guaranteed to be very expensive in terms of CPU for Power Query.

Here’s what Profiler shows for the refresh operation when no gateway is used:

The refresh took around 30 seconds and used around 44 seconds of CPU time.

Here’s what Profiler shows when the refresh does use a gateway:

The refresh takes a lot longer, around 103 seconds (as you would expect – instead of loading the data from ADLSgen2 storage in the cloud to the Power BI Service, it has to take a round trip via the gateway on my PC) but the important thing is that the CPU time is now very low – 141 milliseconds.

So, as you might expect, the CPU time for refreshes that use an on-premises data gateway is not shown in Profiler traces because, as I said, all the work done by the Power Query engine is done on the gateway machine and not in the Power BI Service. Making refreshes use a gateway, even when you don’t need to, can be a way of taking load off a Power BI Premium capacity if it’s overloaded.

This in turn raises the question of how you measure Power Query CPU usage on a gateway? As far as I know it isn’t possible for individual Power Query queries (I could be wrong though), although the gateway logs do allow you to capture CPU usage for the whole machine. Better gateway monitoring tools are on the way but this seems like a good time to mention my colleague Rui Romano’s open source gateway monitoring solution (article | repo) which makes understanding the gateway logs a lot easier.

5 thoughts on “Gateways And Measuring Power Query CPU Usage During Power BI Dataset Refresh

  1. I’ve been looking for some improved monitoring of the ADF gateways. I’m going to see what can be stolen from Rui Romano’s solution

  2. Thanks very much for taking the time to write this Chris, this was an awesome post!

  3. Thanks, Chriss, for this proof of internal working while using /adding the gateway to the Power BI Infrastructure!

    The one thing that is not really being clear to me is when we are looking at the Premium Capacity Gen2 Utilization Metrics how are the background operations (mainly Power Query operations) metrics impacted?

    We know about the “smoothing” effect of allocating / bucketing the operations from the last 24 hours, but how we should read those metrics in clients with heavy reliance on the on-prem gateways.

    The common sense and your article proofs that by using on-prem gateways we “remove” CPU utilization (seconds) but the overall refresh (duration) times are worst.

    My point is, can we make any sense of what is happening using capacity metrics in this kind of scenarios?

Leave a Reply