ApproximateDistinctCount DAX Function Now Works On More DirectQuery Sources

Some good news for those of you using DirectQuery mode in Power BI: the ApproximateDistinctCount DAX function, which returns an estimate of the number of the distinct values in a column and which can be a lot faster than a true distinct count as returned by the DistinctCount function, is now available to use with BigQuery, Databricks and Snowflake sources. It only worked with Azure SQL DB and Synapse before; RedShift is coming soon. You can use it in exactly the same way that you would with the DistinctCount function except that it only works in DirectQuery mode.

For example, I have a Power BI DirectQuery dataset that uses the New York Taxi data in Snowflake as its source. With the following two DAX measures defined on the Trip table:

Approximate Distinct Medallions = APPROXIMATEDISTINCTCOUNT('TRIP'[MEDALLIONID])
Distinct Medallions = DISTINCTCOUNT('TRIP'[MEDALLIONID])

…I can build a table visual that compares the output of the two measures (as you can see, the difference isn’t that big):

…and see that the ApproximateDistinctCount DAX function is translated to the APPROX_COUNT_DISTINCT function in Snowflake SQL:

select { fn convert(count(distinct("MEDALLIONID")), SQL_DOUBLE) } + { fn convert(max("C1"), SQL_DOUBLE) } as "C1",
    approx_count_distinct("MEDALLIONID") as "C2"
from 
(
    select "DATEID",
        "MEDALLIONID",
        "HACKNEYLICENSEID",
        "PICKUPTIMEID",
        "DROPOFFTIMEID",
        "PICKUPGEOGRAPHYID",
        "DROPOFFGEOGRAPHYID",
        "PICKUPLATITUDE",
        "PICKUPLONGITUDE",
        "PICKUPLATLONG",
        "DROPOFFLATITUDE",
        "DROPOFFLONGITUDE",
        "DROPOFFLATLONG",
        "PASSENGERCOUNT",
        "TRIPDURATIONSECONDS",
        "TRIPDISTANCEMILES",
        "PAYMENTTYPE",
        "FAREAMOUNT",
        "SURCHARGEAMOUNT",
        "TAXAMOUNT",
        "TIPAMOUNT",
        "TOLLSAMOUNT",
        "TOTALAMOUNT",
        case
            when "MEDALLIONID" is null
            then CAST(1 as INTEGER)
            else CAST(0 as INTEGER)
        end as "C1"
    from "NYCDATA_DB"."NYCTAXIDATA"."TRIP"
) as "ITBL"

Distinct counts are often the slowest type of measure but in my experience report consumers are very unwilling to accept seeing “near enough” numbers in their reports rather than numbers that are 100% accurate, even if the approximate distinct counts are much faster. I heard someone suggest using field parameters to allow report consumers to switch between showing fast approximate distinct counts for exploration and accurate distinct counts when they really need them, and I think this is a great compromise.

Bonus links: if you need to do an approximate distinct count in Import mode, Phil Seamark shows how to do this here; Phil also wrote a great post on building aggregations for distinct counts (read it carefully – this is a really powerful technique!) here.

One thought on “ApproximateDistinctCount DAX Function Now Works On More DirectQuery Sources

Leave a Reply