Since Snowflake announced its official launch on Azure in November 2018, there has been steady growth in the adoption of Snowflake accounts on the platform, leading to a significant increase in the use of Power BI as a data visualization and analytics tool. This trend has generated growing interest among customers looking to understand the best strategies for successfully deploying Power BI in conjunction with Snowflake.
Snowflake continues to invest in strengthening its collaboration with Microsoft Azure, making constant improvements to the integration with Microsoft Power BI. Over 16 consecutive years, Power BI has maintained its leadership position in the Gartner Magic Quadrant as an essential tool for Snowflake customers for data analysis, visualization and reporting.
Snowflake is positioning itself as the cloud platform of choice for experts and enterprises and Power BI remains the leading business intelligence platform in the market. Therefore, in this article we will explore best practices for optimizing the joint use of these two platforms.
First, however, we answer some frequently asked questions about Snowflake and its integration with Azure and Power BI.
Snowflake is a cloud data warehousing and analytics platform that provides an innovative approach to data management. It is a relational database in the cloud, designed for scalability and performance in cloud environments such as Microsoft Azure, Amazon Web Services (AWS) and Google Cloud Platform (GCP). Unlike traditional databases, Snowflake is based on a multi-cluster architecture and uses separate storage and compute, which enables independent scalability and performance
Azure Snowflake is the collaboration between Snowflake and Microsoft Azure that enables users to leverage Snowflake capabilities on the Azure platform. Azure Snowflake combines the scalability and performance of Snowflake with Azure's infrastructure and cloud services. This provides customers with a comprehensive solution for cloud data storage and analytics.
With Azure Snowflake, users can leverage Snowflake's capabilities to manage and analyze their key business data, while benefiting from the security, reliability and elasticity of the Azure cloud. This collaboration has been very successful and has led to a significant increase in adoption of Snowflake on the Azure platform.
Power BI and Snowflake are two platforms that complement each other perfectly, providing users with a comprehensive solution for data storage, analysis and visualization in the cloud. Snowflake, with its multi-cluster architecture and separate storage and compute, offers independent scalability and performance, while Power BI, as a leading BI and analytics tool, is essential for enterprise data visualization and analysis.
The collaboration between Snowflake and Microsoft Azure has enabled customers to deploy Power BI alongside Snowflake on the Azure platform, giving them the ability to visualize and analyze their data more efficiently. This trend has led to steady growth in the adoption of Snowflake accounts on the platform, demonstrating the value of using Power BI and Snowflake together. With these two platforms working together, users can make the most of their data, gain valuable insights and make informed decisions to drive business success.
Here are some essential practices for establishing a successful workflow when combining Power BI and Snowflake, as well as improving Snowflake's performance.
We recommend using the native Snowflake connector in both Power BI Desktop and Power BI Service when connecting Power BI to Snowflake, rather than setting up the generic ODBC driver to connect to Snowflake.
Snowflake and Microsoft are working closely together to continually improve this connector, with recent updates in July and October 2023, and with the promise of future optimizations on the way.
When choosing between Import (copying all data into Power BI), DirectQuery (directly querying Snowflake each time a report is run) or Dual (configuration that allows tables to act both cached and uncached), it is crucial to consider the needs of each company.
Each approach has its particular advantages and limitations.
Minimizing latency impacts and limiting network bandwidth means that we must carefully consider the location of our Power BI tenant or Premium capability relative to the location of our Snowflake account.
In the case of using Snowflake in Azure, we are suggested to store Snowflake data in the same Azure region as the default region of our Power BI tenant. This alignment of locations contributes to a more efficient connection and optimizes overall system performance.
The strategic investment in building BI and analytics using Snowflake should not be compromised by an inadequate configuration in Power BI. The essential premise lies in maximizing query performance through parallelization, seeking to run as many DAX or MDX queries as possible simultaneously. This parallelization approach decreases the impact of delays from data sources and network latencies on overall query performance.
Increasing the number of connections can make a noticeable difference, especially on report pages with numerous Power BI visuals or when some visuals generate slower queries.
It is important to note that query parallelization does not reduce the number of SQL queries generated by Power BI, but it does decrease times by running queries concurrently rather than sequentially.
Enabling SSO on the connection between Power BI and Snowflake means that the identity of the user running the report is also used to send SQL queries generated by Power BI to Snowflake. This practice ensures that reports respect the security rules defined in Snowflake and improves data governance and management in Power BI.
Although the Snowflake connector offers the ability to employ custom SQL queries as a source for tables in your dataset, Snowflake advises caution in this choice.
This practice, while viable, can trigger performance challenges during development in Power BI Desktop and complicate the task of maintaining the solution, as it introduces SQL code directly into the dataset definition.
However, it is worth noting that the October 2023 update to Power BI incorporates a significant update that improves the performance of "Custom SQL" in Snowflake. Power BI Desktop users are urged to upgrade immediately to take advantage of this enhancement, which is automatically deployed to the Power BI service.
Snowflake recommends allocating specific warehouses for different Power BI projects or semantic models.
On the other hand, when working with Power BI and Snowflake together, there are certain techniques to optimize storage performance. The most prominent are:
The way we model our data has a significant impact on the performance of Power BI. Unlike many other BI tools, Power BI is specifically designed to work optimally with dimensional models and star schemas, rather than preferring flat, long tables.
Also, for the joint use of Snowflake and Power BI, it is recommended to perform all data modeling work in Snowflake before performing any tasks in Power BI. This approach ensures a solid and optimized foundation for subsequent analysis in Power BI, thus maximizing the efficiency of the process.
On the other hand, Snowflake advises avoiding the use of Power BI features, such as the Power Query Editor or the creation of calculated columns, to address data modeling deficiencies.
Snowflake can benefit from Power BI's aggregations and composite modeling capabilities.
The aggregations feature in Power BI allows you to incorporate tables with pre-aggregated data into the dataset, thus improving query performance. These aggregations can be stored in Import or DirectQuery mode. In addition, the implementation of hidden aggregation tables in Power BI semantic models allows queries to be optimized by directing them to the appropriate table as needed.
In DirectQuery environments, these aggregation tables can solve similar problems to the materialized views in Snowflake, and it is suggested that both options be evaluated to determine which offers the greatest performance advantages.
Finally, it is important to remember that Power BI automatic aggregations are also compatible with Snowflake, except when SSO is enabled.
When working with Snowflake in Power BI, it is important to limit the number of visual elements on a page.
It is critical to consider factors such as specific workload, data volume, and data modeling and reporting methodology. In addition, when working with calculated measures, it is important to evaluate more efficient alternative strategies, such as the use of Data Analysis Expressions (DAX). Extensive testing on each client's particular workload is essential to determine the most beneficial experience for their user communities.
Accurate selection of tables, rows and columns is essential to meet business requirements. In this context, when working with Power BI, the premise of "more data is not always the solution" is fundamental.
In this regard, it is recommended to remove columns that are not absolutely necessary, as they not only consume memory, but also contribute to complexity and potentially increase the volume of data, ultimately negatively affecting performance.
When working with Power BI it is important to take advantage of query optimization tools to limit excessive query generation. This strategy is especially useful when incorporating filters into the visualization.
Too many queries can negatively affect the performance of Power BI.
Although the default setting in Power BI sets a left outer join on relationships, enabling the "Assume referential integrity" option makes it possible to force an inner join. This setting can result in more agile queries. By relying on data integrity in Snowflake tables, enabling this feature allows Power BI to generate more efficient SQL queries for data retrieval.
In Power BI Premium environments, adjusting the workload settings is crucial to handle datasets efficiently. Of particular note is the "Maximum intermediate row set count" parameter, which controls the maximum number of rows that can be returned by a SQL query in DirectQuery mode.
This limit can arise when modeling complex relationships or writing elaborate DAX expressions. Rather than simply increasing this value, which could defer problems or generate other performance issues, it is suggested to rethink the data structure or rewrite DAX expressions to address the situation.