Power BI DirectQuery Best Practices Video

Here’s a recording of a session I did for the Manchester (UK) Power BI user group recently on best practices for DirectQuery mode in Power BI:

I’ve done it for a few other groups over the last six months but this is the latest and best version, I think.

I’ve worked with several customers using DirectQuery mode since I joined the CAT team and learned a lot along the way. Some of this knowledge has been written up by me (eg this post on Snowflake DirectQuery mode best practices), some of it by the people who work with the data sources Power BI runs on top of (see posts by Dany Hoter on the Azure Data Explorer blog, for example; there’s also going to be a lot of new material on DirectQuery for Databricks coming soon, with this post as a start). There’s a lot of detailed information in the docs too, for example here, here and here.

But remember folks: the most important piece of advice around DirectQuery is to not use it unless you’re really, really, really sure you have no other option. It’s possible to make it work well but it takes a lot more tuning and specialist skill than Import mode!

17 thoughts on “Power BI DirectQuery Best Practices Video

  1. I was always wondering – isn’t Import Mode Direct Query by another name? The difference seemingly is the data source location:

    in “Direct Query” mode it is the (assumedly on-prem) data source that (hopefully) supports the amount of custom queries and all the network traffic.

    “Import Mode” eventually results in a Power BI dataset in the Azure cloud that behaves the same as a Direct query source, but with (allegedly) guaranteed/stable performance and network throughput. The drawback seems to be the loss of immediacy.

    1. No, I would say Import mode and DQ are very different. A Power BI report on an Import mode dataset is like you going home and talking to your mother in your native language. Power BI on a DQ dataset is like you having a Teams call with someone in another country using machine translation: you can understand each other, and the machine translation gets better every day, but communication is still not as efficient as it could be.

  2. Thank you very much for sharing this.

    >> But remember folks: the most important piece of advice around DirectQuery is to not use it unless you’re really, really, really sure you have no other option

    I did not realize this was your stance. I’ve had minimal use for DirectQuery thus far (outside of composite modeling aka “DirectQuery for PBI Data”). But had always heard bloggers say that DirectQuery was “essentially ROLAP done right”. I think there is a crossjoin blog which stated this back in November 2010…

    The primary reason I hoped to use DirectQuery is to compensate for the expensive cost of memory in PBI Premium P1. So far we have stayed under our 25 GB limit, but as more datasets are hosted on premium, I was expecting that we would need to start using some DirectQuery models for overflow purposes.

    1. “ROLAP done right” does sound like something I would say, but with experience I’m not more wary of DQ. I’m not as anti-DQ as Marco is for example and I *have* seen successful implementations on very large data volumes (especially with Azure Data Explorer which, BTW, is amazing technology) but I have also seen plenty of failures caused by people ignoring the best practices and by other limitations. We’re doing a lot of work to improve DQ and there are still more changes to come so hopefully in a year or so the situation will be a lot better.

      >>The primary reason I hoped to use DirectQuery is to compensate for the expensive cost of memory in PBI Premium P1.
      The interesting thing is that we have found that Import mode usually works out cheaper than DirectQuery mode for customers: yes, you need to pay for Premium but the overall cost works out to be less than paying for the equivalent resources on almost all cloud data warehouse platforms.

      1. >> cost works out to be less than paying for the equivalent resources on almost all cloud data warehouse platforms….

        I see what you are saying. Any underlying resource which you expect to have great (OLAP-ish) performance is probably going to cost lots of money anyway, and you may as well use PBI’s RAM instead.

        In my case, I just have an Azure SQL elastic pool that runs the whole month for just ~$1K. It doesn’t get heavily used during the business day (our ETL’s primarily run overnight). So I have some comfort that it would be suitable for a reasonable level of reporting work. As I mentioned it would only be used for the “overflow” workloads in “second-class” datasets.

        In any case, I’m not talking about using an expensive cloud data warehouse platform (eg. like an expensive synapse-dedicated-pool or a snowflake database). I would use certainly think PBI’s RAM would be cheaper than those things. We have been able to rely on Azure SQL for storage instead. It seems to be a great value, if you can deal with the annoyance of the I/O throttling that impacts us on occasion (during multi-million row Spark upserts).

  3. It was a good video, found on youtube and now I’m going through your blogs.

    I am new to Power Bi, and have only used Import thus far.

    I have a fairly small SQL Server database, around 12GB, I was looking at workflow for getting near real time results onto a dashboard. Do you have any blogs you would recommend I look at in terms of an alternative workflow to using Direct Query where SQL is the source?

    We have been investigating SQL transactional replication and then connecting Power Bi to the replicated db.

    1. One alternative workflow would be to use Import mode, refresh on a very regular basis, probably using incremental refresh and triggering it via the Enhanced Refresh API (so you get around the limits on the number of scheduled refreshes in a day in the Service). But the question should always be: why you do you need near real time results? Users always ask for this without understanding the costs or implications.

      1. We are live venue, we have people coming through the door – so its for operational workers to see how many people are in the building. We have a legacy dashboard at the moment, and I’m looking for a cost effective way to replace it. Additionally I want to take advantage of Power Bi to look at other parts of the business (that wouldn’t refresh in real time).

      2. OK, so that does sound like a valid scenario for real-time :-). I guess the measures you would like to use are fairly straightforward – counts and sums? In that case maybe DirectQuery is a good fit.

  4. Great video, Chris. Can you elaborate further on the very last bullet point of the Report Design slide where you say “Paginated report visuals can be a great alternative here”

    Are you talking about setting up a button to trigger an export with more data or something, rather than trying to display the data in the visual?

    1. All I really meant was that when your users need a large table, it’s sometimes because they want to export the data – in which case, as you’ve seen, a paginated report visual gives you more control over the query and the export behaviour

    2. Paginated reports I have found is the achilles heel. Perhaps you can elaborate on how Power BI shares an imported dataset across 100s of users. Its my understanding that Paginated reports do not sshare a dataset. Do if I import data as suggested in this article, e.g.20 Gb and I get 400-500 users that run the report at 8 am, Power BI holds 500x the 20Gb since it does not know how to share

      1. You’re getting confused between a Power BI dataset and what paginated reports call a dataset ( which is understandable). If you import data into a Power BI dataset and use that as the source for your paginated reports the data is only stored once.

  5. Hi Chris, first, great video – much appreciated.

    I’m curious: you mention at 29:50 in the video that you were working on a collaboration with Databricks concerning best practices for direct query mode, and that it would be published in a couple of months. Are you still planning to publish something?

    I ask because we’re in the process of migrating to Databricks. There’s a chance we might need to use direct query mode due to the volumes of data involved and it would be good to ensure we’re making the right design decisions in advance.

    Many thanks!

Leave a Reply