Text Search Performance In Power BI

In the blog post for the October 2022 release of Power BI Desktop there’s a brief description of a new optimisation in Power BI for filters that perform searches in text columns. In this blog post I’m going to share more details about this optimisation and how you can make sure your reports can benefit from it.

First of all, a brief description of the problem. Let’s say you have a Power BI dataset containing just the following table:

Let’s also say you want to build a report where a user can search for certain terms in the Description field and display a filtered table in a report:

Note how the Description field has been dragged into the Filter pane, the filter type is set to “Advanced filtering”, the “Show items when the value” dropdown has “contains” selected and the search term is “citrus”, so the table on the left only shows the fruit where the description includes the text “citrus”. Some custom visuals such as the Text Filter visual have very similar functionality.

Here’s the DAX query generated for the table visual in this screenshot:

DEFINE
	VAR __DS0FilterTable = 
		FILTER(
			KEEPFILTERS(VALUES('Fruit'[Description])),
			SEARCH("citrus", 'Fruit'[Description], 1, 0) >= 1
		)
	VAR __DS0Core = 
		CALCULATETABLE(
			SUMMARIZE('Fruit', 'Fruit'[Fruit Name], 'Fruit'[Description]),
			KEEPFILTERS(__DS0FilterTable)
		)
	VAR __DS0PrimaryWindowed = 
		TOPN(501, __DS0Core, 'Fruit'[Fruit Name], 1, 'Fruit'[Description], 1)
EVALUATE
	__DS0PrimaryWindowed
ORDER BY
	'Fruit'[Fruit Name], 'Fruit'[Description]

As you can see, the filter in this query is accomplished using the DAX Search() function. This is a great example of the type of query that the optimisation I’m talking about can speed up.

Here are some more details about how this optimisation works:

  • This optimisation is now enabled and works in the same way in both Power BI Desktop and the Power BI Service.
  • The first time that any query or measure that uses either the Search() or ContainsString() DAX functions on a text column is evaluated, Power BI starts to build a special text index just for that column.
  • This index build will only be successful if two conditions are true:
    • The text column must only contain characters from the classic 128 character ASCII set.
    • The index build must take less than 25 seconds. If 25 seconds elapse then the build will timeout and Power BI will continue to run the query without the index being present.
  • If the index build succeeds for that column then the index can be used by all subsequent queries by all users, but it will be dropped when:
    • Power BI Desktop is restarted, if you’re in Power BI Desktop.
    • The dataset is refreshed, either in Power BI Desktop or the Power BI Service.
    • The dataset is evicted from memory in the Power BI Service or when the dataset is under memory pressure.
  • DAX queries that use the index will be a lot faster than queries that do not, although the difference will only be noticeable when you are searching on a table with thousands of rows and in a column with reasonably long text values.
  • There is no way for you to know whether an index has been built or whether the build has failed, or if a DAX query uses an index, unfortunately. However if you look at the duration of the DAX queries that do this kind of search (for example in Log Analytics or by running a Profiler trace) and you see the first query after a refresh is relatively slow and subsequent queries are almost instant then it’s likely that the index has been built successfully; on the other hand if your queries are consistently slow then it’s likely the index has not been built successfully. Before you all leave comments complaining, I know this is not ideal and I hope we’ll be able to make further improvements in the future.

How can you ensure that the index is built successfully? The only way to ensure that you stay under the 25 second timeout limit is to reduce the amount of text that needs to be indexed, either by reducing the number of rows in the table or by reducing the amount of text in the column. Reducing the amount of text in the column is probably the only feasible option: for example you may be able to remove words like “and” and “the” from your text since users are less likely to need to search for them. Making sure your text column only contains ASCII characters is much harder because you can’t just remove all non-ASCII characters (such as characters with diacritics) without making your text unsearchable; I have a good-enough solution in the form of an M custom function which removes diacritics from characters in text and ensures that only ASCII characters are left here.

[Thanks to Jeffrey Wang for the information in this post]

10 thoughts on “Text Search Performance In Power BI

  1. When the new Dataset Scale Out feature is active, will it also truncate the index after refreshing the dataset? Will the index be rebuilt because potential new data arrived after a refresh or because of potential memory pressure during processing?

    1. Yes, I’m pretty sure that the index will be dropped after a refresh even with the new scale-out feature. It gets dropped after any refresh – it doesn’t matter what memory is used during the refresh. The dataset can come under memory pressure at other times apart from refresh though so the index could be dropped at other times too.

Leave a Reply