7 Optimize A Model For Performance in Power BI
7 Optimize A Model For Performance in Power BI
7 Optimize A Model For Performance in Power BI
performance in
Power BI
1 hr 9 min
Module
8 Units
4.8 (6,984)
Rate it
Intermediate
Data Analyst
Power BI
Microsoft Power Platform
Learning objectives
By the end of this module, you will be able to:
Review the performance of measures, relationships, and visuals.
Use variables to improve performance and troubleshooting.
Improve performance by reducing cardinality levels.
Optimize DirectQuery models with table level storage.
Create and manage aggregations.
StartSave
Prerequisites
None
You might find that your report runs well in test and development environments, but when
deployed to production for broader consumption, performance issues arise. From a report
user's perspective, poor performance is characterized by report pages that take longer to load
and visuals taking more time to update. This poor performance results in a negative user
experience.
As a data analyst, you will spend approximately 90 percent of your time working with your
data, and nine times out of ten, poor performance is a direct result of a bad data model, bad
Data Analysis Expressions (DAX), or the mix of the two. The process of designing a data
model for performance can be tedious, and it is often underestimated. However, if you
address performance issues during development, you will have a robust Power BI data model
that will return better reporting performance and a more positive user experience. Ultimately,
you will also be able to maintain optimized performance. As your organization grows, the
size of its data grows, and its data model becomes more complex. By optimizing your data
model early, you can mitigate the negative impact that this growth might have on the
performance of your data model.
A smaller sized data model uses less resources (memory) and achieves faster data refresh,
calculations, and rendering of visuals in reports. Therefore, the performance optimization
process involves minimizing the size of the data model and making the most efficient use of
the data in the model, which includes:
For example, assume that you work as a Microsoft Power BI developer for Tailwind Traders.
You have been given a task to review a data model that was built a few years ago by another
developer, a person who has since left the organization.
The data model produces a report that has received negative feedback from users. The users
are happy with the results that they see in the report, but they are not satisfied with the report
performance. Loading the pages in the report is taking too long, and tables are not refreshing
quickly enough when certain selections are made. In addition to this feedback, the IT team
has highlighted that the file size of this particular data model is too large, and it is putting a
strain on the organization's resources.
You need to review the data model to identify the root cause of the performance issues and
make changes to optimization performance.
19 minutes
If your data model has multiple tables, complex relationships, intricate calculations, multiple
visuals, and redundant data, a potential exists for poor report performance. The poor
performance of a report leads to a negative user experience.
To optimize performance, you must first identify where the problem is coming from; in other
words, find out which elements of your report and data model are causing the performance
issues. Afterward, you can take action to resolve those issues and, therefore, improve
performance.
You should review the measures and queries in your data model to ensure that you are using
the most efficient way to get the results that you want. Your starting point should be to
identify bottlenecks that exist in the code. When you identify the slowest query in the data
model, you can focus on the biggest bottleneck first and establish a priority list to work
through the other issues.
Analyze performance
You can use Performance analyzer in Power BI Desktop to help you find out how each of
your report elements are performing when users interact with them. For example, you can
determine how long it takes for a particular visual to refresh when it is initiated by a user
interaction. Performance analyzer will help you identify the elements that are contributing
to your performance issues, which can be useful during troubleshooting.
Before you run Performance analyzer, to ensure you get the most accurate results in your
analysis (test), make sure that you start with a clear visual cache and a clear data engine
cache.
Visual cache - When you load a visual, you can't clear this visual cache without
closing Power BI Desktop and opening it again. To avoid any caching in play,
you need to start your analysis with a clean visual cache.
To ensure that you have a clear visual cache, add a blank page to your Power BI
Desktop (.pbix) file and then, with that page selected, save and close the file.
Reopen the Power BI Desktop (.pbix) file that you want to analyze. It will open
on the blank page.
Data engine cache - When a query is run, the results are cached, so the results
of your analysis will be misleading. You need to clear the data cache before
rerunning the visual.
To clear the data cache, you can either restart Power BI Desktop or connect
DAX Studio to the data model and then call Clear Cache.
When you have cleared the caches and opened the Power BI Desktop file on the blank page,
go to the View tab and select the Performance analyzer option.
To begin the analysis process, select Start recording, select the page of the report that you
want to analyze, and interact with the elements of the report that you want to measure. You
will see the results of your interactions display in the Performance analyzer pane as you
work. When you are finished, select the Stop button.
For more detailed information, see Use Performance Analyzer to examine report element
performance.
Review results
You can review the results of your performance test in the Performance analyzer pane. To
review the tasks in order of duration, longest to shortest, right-click the Sort icon next to
the Duration (ms) column header, and then select Total time in Descending order.
The log information for each visual shows how much time it took (duration) to complete the
following categories of tasks:
DAX query - The time it took for the visual to send the query, along with the
time it took Analysis Services to return the results.
Visual display - The time it took for the visual to render on the screen,
including the time required to retrieve web images or geocoding.
Other - The time it took the visual to prepare queries, wait for other visuals to
complete, or perform other background processing tasks. If this category
displays a long duration, the only real way to reduce this duration is to optimize
DAX queries for other visuals, or reduce the number of visuals in the report.
The results of the analysis test help you to understand the behavior of your data model and
identify the elements that you need to optimize. You can compare the duration of each
element in the report and identify the elements that have a long duration. You should focus
on those elements and investigate why it takes them so long to load on the report page.
To analyze your queries in more detail, you can use DAX Studio, which is a free, open-
source tool that is provided by another service.
Visuals
If you identify visuals as the bottleneck leading to poor performance, you should find a way
to improve performance with minimal impact to user experience.
Consider the number of visuals on the report page; fewer visuals means better performance.
Ask yourself if a visual is really necessary and if it adds value to the end user. If the answer is
no, you should remove that visual. Rather than using multiple visuals on the page, consider
other ways to provide additional details, such as drill-through pages and report page tooltips.
Examine the number of fields in each visual. The more visuals you have on the report, the
higher chance for performance issues. In addition, the more visuals, the more the report can
appear crowded and lose clarity. The upper limit for visuals is 100 fields (measures or
columns), so a visual with more than 100 fields will be slow to load. Ask yourself if you
really need all of this data in a visual. You might find that you can reduce the number of
fields that you currently use.
DAX query
When you examine the results in the Performance analyzer pane, you can see how long it
took the Power BI Desktop engine to evaluate each query (in milliseconds). A good starting
point is any DAX query that is taking longer than 120 milliseconds. In this example, you
identify one particular query that has a large duration time.
Performance analyzer highlights potential issues but does not tell you what needs to be
done to improve them. You might want to conduct further investigation into why this
measure takes so long to process. You can use DAX Studio to investigate your queries in
more detail.
For example, select Copy Query to copy the calculation formula onto the clipboard, then
paste it into Dax Studio. You can then review the calculation step in more detail. In this
example, you are trying to count the total number of products with order quantities greater
than or equal to five.
Copy
Count Customers =
CALCULATE (
DISTINCTCOUNT ( Order[ProductID] ),
FILTER ( Order, Order[OrderQty] >= 5 )
)
After analyzing the query, you can use your own knowledge and experience to identify where
the performance issues are. You can also try using different DAX functions to see if they
improve performance. In the following example, the FILTER function was replaced with the
KEEPFILTER function. When the test was run again in Performance analyzer, the duration
was shorter as a result of the KEEPFILTER function.
Copy
Count Customers =
CALCULATE (
DISTINCTCOUNT ( Order[ProductID] ),
KEEPFILTERS (Order[OrderQty] >= 5 )
)
In this case, you can replace the FILTER function with the KEEPFILTER function to
significantly reduce the evaluation duration time for this query. When you make this change,
to check whether the duration time has improved or not, clear the data cache and then rerun
the Performance analyzer process.
Data model
If the duration of measures and visuals are displaying low values (in other words they have a
short duration time), they are not the reason for the performance issues. Instead, if the DAX
query is displaying a high duration value, it is likely that a measure is written poorly or an
issue has occurred with the data model. The issue might be caused by the relationships,
columns, or metadata in your model, or it could be the status of the Auto date/time option, as
explained in the following section.
Relationships
You should review the relationships between your tables to ensure that you have established
the correct relationships. Check that relationship cardinality properties are correctly
configured. For example, a one-side column that contains unique values might be incorrectly
configured as a many-side column. You will learn more about how cardinality affects
performance later in this module.
Columns
It is best practice to not import columns of data that you do not need. To avoid deleting
columns in Power Query Editor, you should try to deal with them at the source when loading
data into Power BI Desktop. However, if it is impossible to remove redundant columns from
the source query or the data has already been imported in its raw state, you can always use
Power Query Editor to examine each column. Ask yourself if you really need each column
and try to identify the benefit that each one adds to your data model. If you find that a column
adds no value, you should remove it from your data model. For example, suppose that you
have an ID column with thousands of unique rows. You know that you won't use this
particular column in a relationship, so it will not be used in a report. Therefore, you should
consider this column as unnecessary and admit that it is wasting space in your data model.
When you remove an unnecessary column, you will reduce the size of the data model which,
in turn, results in a smaller file size and faster refresh time. Also, because the dataset contains
only relevant data, the overall report performance will be improved.
For more information, see Data reduction techniques for Import modeling.
Metadata
Metadata is information about other data. Power BI metadata contains information on your
data model, such as the name, data type and format of each of the columns, the schema of the
database, the report design, when the file was last modified, the data refresh rates, and much
more.
When you load data into Power BI Desktop, it is good practice to analyze the corresponding
metadata so you can identify any inconsistences with your dataset and normalize the data
before you start to build reports. Running analysis on your metadata will improve data model
performance because, while analyzing your metadata, you will identify unnecessary columns,
errors within your data, incorrect data types, the volume of data being loaded (large datasets,
including transactional or historic data, will take longer to load), and much more.
You can use Power Query Editor in Power BI Desktop to examine the columns, rows, and
values of the raw data. You can then use the available tools, such as those highlighted in the
following screenshot, to make the necessary changes.
Unnecessary columns - Evaluates the need for each column. If one or more
columns will not be used in the report and are therefore unnecessary, you should
remove them by using the Remove Columns option on the Home tab.
Unnecessary rows - Checks the first few rows in the dataset to see if they are
empty or if they contain data that you do not need in your reports; if so, it
removes those rows by using the Remove Top Rows option on the Home tab.
Data type - Evaluates the column data types to ensure that each one is correct.
If you identify a data type that is incorrect, change it by selecting the column,
selecting Data Type on the Transform tab, and then selecting the correct data
type from the list.
Query names - Examines the query (table) names in the Queries pane. Just like
you did for column header names, you should change uncommon or unhelpful
query names to names that are more obvious or names that the user is more
familiar with. You can rename a query by right-clicking that query,
selecting Rename, editing the name as required, and then pressing Enter.
Column details - Power Query Editor has the following three data preview
options that you can use to analyze the metadata that is associated with your
columns. You can find these options on the View tab, as illustrated in the
following screenshot.
o Column quality - Determines what percentage of items in the
column are valid, have errors, or are empty. If the Valid percentage is
not 100, you should investigate the reason, correct the errors, and
populate empty values.
o Column distribution - Identifies how many distinct items you have
and how many are unique. This information is useful when you want
to identify the cardinality of a column. You will investigate this
further later in this module.
o Column profile - Shows more statistics for the column and a chart
showing the distribution of the unique items.
Note
If you are reviewing a large dataset with more than 1,000 rows, and you want to analyze that
whole dataset, you need to change the default option at the bottom of the window.
Select Column profiling based on top 1000 rows > Column profiling based on entire
data set.
Other metadata that you should consider is the information about the data model as a whole,
such as the file size and data refresh rates. You can find this metadata in the associated Power
BI Desktop (.pbix) file. The data that you load into Power BI Desktop is compressed and
stored to the disk by the VertiPaq storage engine. The size of your data model has a direct
impact on its performance; a smaller sized data model uses less resources (memory) and
achieves faster data refresh, calculations, and rendering of visuals in reports.
The Auto date/time option allows you to work with time intelligence when filtering,
grouping, and drilling down through calendar time periods. We recommend that you keep
the Auto date/time option enabled only when you work with calendar time periods and when
you have simplistic model requirements in relation to time.
If your data source already defines a date dimension table, that table should be used to
consistently define time within your organization, and you should disable the global Auto
date/time option. Disabling this option can lower the size of your data model and reduce the
refresh time.
You can enable/disable this Auto date/time option globally so that it applies to all of your
Power BI Desktop files, or you can enable/disable the option for the current file so that it
applies to an individual file only.
You can use variables in your DAX formulas to help you write less complex and more
efficient calculations. Variables are underused by developers who are starting out in Power
BI Desktop, but they are effective and you should use them by default when you are creating
measures.
Some expressions involve the use of many nested functions and the reuse of expression logic.
These expressions take a longer time to process and are difficult to read and, therefore,
troubleshoot. If you use variables, you can save query processing time. This change is a step
in the right direction toward optimizing the performance of a data model.
The use of variables in your data model provides the following advantages:
The second row of the table shows the improved measure definition. This definition uses the
VAR keyword to introduce a variable named SalesPriorYear, and it uses an expression to
assign the "same period last year" result to that new variable. It then uses the variable twice in
the RETURN expression.
Without variable
DAXCopy
Sales YoY Growth =
DIVIDE (
( [Sales] - CALCULATE ( [Sales], PARALLELPERIOD ( 'Date'[Date], -12, MONTH ) )
),
CALCULATE ( [Sales], PARALLELPERIOD ( 'Date'[Date], -12, MONTH ) )
)
With variable
DAXCopy
Sales YoY Growth =
VAR SalesPriorYear =
CALCULATE ( [Sales], PARALLELPERIOD ( 'Date'[Date], -12, MONTH ) )
VAR SalesVariance =
DIVIDE ( ( [Sales] - SalesPriorYear ), SalesPriorYear )
RETURN
SalesVariance
In the first measure definition in the table, the formula is inefficient because it requires Power
BI to evaluate the same expression twice. The second definition is more efficient because,
due to the variable, Power BI only needs to evaluate the PARALLELPERIOD expression
once.
If your data model has multiple queries with multiple measures, the use of variables could cut
the overall query processing time in half and improve the overall performance of the data
model. Furthermore, this solution is a simple one; imagine the savings as the formulas get
more complicated, for instance, when you are dealing with percentages and running totals.
When using variables, it is best practice to use descriptive names for the variables. In the
previous example, the variable is called SalesPriorYear, which clearly states what the
variable is calculating. Consider the outcome of using a variable that was
called X, temp or variable1; the purpose of the variable would not be clear at all.
Using clear, concise, meaningful names will help make it easier for you to understand what
you are trying to calculate, and it will be much simpler for other developers to maintain the
report in the future.
In the following example, you test an expression that is assigned to a variable. In order to
debug you temporarily rewrite the RETURN expression to write to the variable. The measure
definition returns only the SalesPriorYear variable because that is what comes after the
RETURN expression.
DAXCopy
Sales YoY Growth % =
VAR SalesPriorYear = CALCULATE([Sales], PARALLELPERIOD('Date'[Date], -12, MONTH))
VAR SalesPriorYear% = DIVIDE(([Sales] - SalesPriorYear), SalesPriorYear)
RETURN SalesPriorYear%
The RETURN expression will display the SalesPriorYear value only. This technique allows
you to revert the expression when you have completed the debugging. It also makes
calculations simpler to understand due to reduced complexity of the DAX code.
Reduce cardinality
Completed100 XP
7 minutes
When you create or edit a relationship, you can configure additional options. By
default, Power BI Desktop automatically configures additional options based on its
best guess, which can be different for each relationship based on the data in the
columns.
The relationships can have different cardinality. Cardinality is the direction of the
relationship, and each model relationship must be defined with a cardinality type.
The cardinality options in Power BI are:
During development, you will be creating and editing relationships in your model, so
when you are building new relationships in your model, regardless of what
cardinality you have chosen, always ensure that both of the columns that you are
using to participate in a relationship are sharing the same data type. Your model will
never work if you try to build a relationship between two columns, where one
column has a text data type and another column has an integer data type.
For example, a source sales fact table stores one row for each order line. Significant
data reduction could be achieved by summarizing all sales metrics if you group by
date, customer, and product, and individual transaction detail is not needed.
Consider, then, that an even more significant data reduction could be achieved by
grouping by date at month level. It could achieve a possible 99 percent reduction in
model size; but, reporting at day level or an individual order level is no longer
possible. Deciding to summarize fact-type data will always involve a tradeoff with the
detail of your data. A disadvantage is that you may lose the ability to drill into data
because the detail no longer exists. This tradeoff could be mitigated by using a
mixed model design.
18 minutes
DirectQuery is one way to get data into Power BI Desktop. The DirectQuery method involves
connecting directly to data in its source repository from within Power BI Desktop. It is an
alternative to importing data into Power BI Desktop.
When you use the DirectQuery method, the overall user experience depends heavily on the
performance of the underlying data source. Slow query response times will lead to a negative
user experience and, in the worst-case scenarios, queries might time out. Also, the number of
users who are opening the reports at any one time will impact the load that is placed on the
data source. For example, if your report has 20 visuals in it and 10 people are using the
report, 200 queries or more will exist on the data source because each visual will issue one or
more queries.
Unfortunately, the performance of your Power BI model will not only be impacted by the
performance of the underlying data source, but also by other uncontrollable factors, such as:
Therefore, using DirectQuery poses a risk to the quality of your model's performance. To
optimize performance in this situation, you need to have control over, or access to, the source
database.
It is suitable in cases where data changes frequently and near real-time reporting
is required.
It can handle large data without the need to pre-aggregate.
It applies data sovereignty restrictions to comply with legal requirements.
It can be used with a multidimensional data source that contains measures such
as SAP Business Warehouse (BW).
If your organization needs to use DirectQuery, you should clearly understand its behavior
within Power BI Desktop and be aware of its limitations. You will then be in a good position
to take action to optimize the DirectQuery model as much as possible.
When you use DirectQuery to connect to data in Power BI Desktop, that connection behaves
in the following way:
When you initially use the Get Data feature in Power BI Desktop, you will
select the source. If you connect to a relational source, you can select a set of
tables and each one will define a query that logically returns a set of data. If you
select a multidimensional source, such as SAP BW, you can only select the
source.
When you load the data, no data is imported into the Power BI Desktop, only
the schema is loaded. When you build a visual within Power BI Desktop,
queries are sent to the underlying source to retrieve the necessary data. The time
it takes to refresh the visual depends on the performance of the underlying data
source.
If changes are made to the underlying data, they won't be immediately reflected
in the existing visuals in Power BI due to caching. You need to carry out a
refresh to see those changes. The necessary queries are present for each visual,
and the visuals are updated accordingly.
When you publish the report to the Power BI service, it will result in a dataset in
Power BI service, the same as for import. However, no data is included with
that dataset.
When you open an existing report in Power BI service, or build a new one, the
underlying source is again queried to retrieve the necessary data. Depending on
the location of the original source, you might have to configure an on-premises
data gateway.
You can pin visuals, or entire report pages, as dashboard tiles. The tiles are
automatically refreshed on a schedule, for example, every hour. You can control
the frequency of this refresh to meet your requirements. When you open a
dashboard, the tiles reflect the data at the time of the last refresh and might not
include the latest changes that are made to the underlying data source. You can
always refresh an open dashboard to ensure that it's up-to-date.
Limitations of DirectQuery connections
The use of DirectQuery can have negative implications. The limitations vary, depending on
the specific data source that is being used. You should take the following points into
consideration:
Now that you have a brief understanding of how DirectQuery works and the limitations that it
poses, you can take action to improve the performance.
Optimize performance
Continuing with the Tailwind Traders scenario, during your review of the data model, you
discover that the query used DirectQuery to connect Power BI Desktop to the source data.
This use of DirectQuery is the reason why users are experiencing poor report performance.
It's taking too long to load the pages in the report, and tables are not refreshing quickly
enough when certain selections are made. You need to take action to optimize the
performance of the DirectQuery model.
You can examine the queries that are being sent to the underlying source and try to identify
the reason for the poor query performance. You can then make changes in Power BI Desktop
and the underlying data source to optimize overall performance.
Optimize data in Power BI Desktop
When you have optimized the data source as much as possible, you can take further action
within Power BI Desktop by using Performance analyzer, where you can isolate queries to
validate query plans.
You can analyze the duration of the queries that are being sent to the underlying source to
identify the queries that are taking a long time to load. In other words, you can identify where
the bottlenecks exist.
You don't need to use a special approach when optimizing a DirectQuery model; you can
apply the same optimization techniques that you used on the imported data to tune the data
from the DirectQuery source. For example, you can reduce the number of visuals on the
report page or reduce the number of fields that are used in a visual. You can also remove
unnecessary columns and rows.
For more detailed guidance on how to optimize a DirectQuery query, see: DirectQuery model
guidance in Power BI Desktop and Guidance for using DirectQuery successfully.
Your first stop is the data source. You need to tune the source database as much as possible
because anything you do to improve the performance of that source database will in turn
improve Power BI DirectQuery. The actions that you take in the database will do the most
good.
Consider the use of the following standard database practices that apply to most situations:
Refer to the guidance documents of your data source and implement their performance
recommendations.
Power BI Desktop gives you the option to send fewer queries and to disable certain
interactions that will result in a poor experience if the resulting queries take a long time to
run. Applying these options prevents queries from continuously hitting the data source, which
should improve performance.
In this example, you edit the default settings to apply the available data reduction options to
your model. You access the settings by selecting File > Options and settings > Options,
scrolling down the page, and then selecting the Query reduction option.
The following query reduction options are available:
Reduce number of queries sent by - By default, every visual interacts with
every other visual. Selecting this check box disables that default interaction.
You can then optionally choose which visuals interact with each other by using
the Edit interactions feature.
Slicers - By default, the Instantly apply slicer changes option is selected. To
force the report users to manually apply slicer changes, select the Add an apply
button to each slicer to apply changes when you're ready option.
Filters - By default, the Instantly apply basic filter changes option is selected.
To force the report users to manually apply filter changes, select one of the
alternative options:
o Add an apply button to all basic filters to apply changes when
you're ready
o Add a single apply button to the filter pane to apply changes at
once (preview)
When aggregating data, you summarize that data and present it in at a higher grain (level).
For example, you can summarize all sales data and group it by date, customer, product, and
so on. The aggregation process reduces the table sizes in the data model, allowing you to
focus on important data and helping to improve the query performance.
Your organization might decide to use aggregations in their data models for the following
reasons:
If you are dealing with a large amount of data (big data), aggregations will
provide better query performance and help you analyze and reveal the insights
of this large data. Aggregated data is cached and, therefore, uses a fraction of
the resources that are required for detailed data.
If you are experiencing a slow refresh, aggregations will help you speed up the
refresh process. The smaller cache size reduces the refresh time, so data gets to
users faster. Instead of refreshing what could be millions of rows, you would
refresh a smaller amount of data instead.
If you have a large data model, aggregations can help you reduce and maintain
the size of your model.
If you anticipate your data model growing in size in the future, you can use
aggregations as a proactive step toward future proofing your data model by
lessening the potential for performance and refresh issues and overall query
problems.
Continuing with the Tailwind Traders scenario, you have taken several steps to optimize the
performance of the data model, but the IT team has informed you that the file size is still too
large. The file size is currently 1 gigabyte (GB), so you need to reduce it to around 50
megabytes (MB). During your performance review, you identified that the previous
developer did not use aggregations in the data model, so you now want to create some
aggregations for the sales data to reduce the file size and further optimize the performance.
Create aggregations
Before you start creating aggregations, you should decide on the grain (level) on which you
want to create them. In this example, you want to aggregate the sales data at the day level.
When you decide on the grain, the next step is to decide on how you want to create the
aggregations. You can create aggregations in different ways and each method will yield the
same results, for example:
If you have access to the database, you could create a table with the aggregation
and then import that table into Power BI Desktop.
If you have access to the database, you could create a view for the aggregation
and then import that view into Power BI Desktop.
In Power BI Desktop, you can use Power Query Editor to create the
aggregations step-by-step.
In this example, you open a query in Power Query Editor and notice that the data has not
been aggregated; it has over 999 rows, as illustrated the following screenshot.
Select the Close and Apply button to close Power Query Editor and apply the changes to
your data model. Return to the Power BI Desktop page and then select the Refresh button to
see the results. Observe the screen because a brief message will display the number of rows
that your data model now has. This number of rows should be significantly less than the
number that you started with. You can also see this number when you open Power Query
Editor again, as illustrated in the following screenshot. In this example, the number of rows
was reduced to 30.
Remember, you started with over 999 rows. Using aggregation has significantly reduced the
number of rows in your dataset, which means that Power BI has less data to refresh and your
model should perform better.
Manage aggregations
When you have created aggregations, you can manage those aggregations in Power BI
Desktop and make changes to their behavior, if required.
You can open the Manage Aggregations window from any view in Power BI Desktop. In
the Fields pane, right-click the table and then select Manage aggregations.
For each aggregation column, you can select an option from the Summarization drop-down
list and make changes to the selected detail table and column. When you are finished
managing the aggregations, select Apply All.
For more detailed information on how to create and manage aggregations, see Use
aggregations in Power BI Desktop.
The benefit of analyzing the metadata is that you can clearly identify data
inconsistences with your dataset.
The benefit of analyzing the metadata is to get familiar with your data.
The benefit of analyzing the metadata is to know the number of rows, columns and
tables being loaded into your model.
2.
What can be achieved by removing unnecessary rows and columns?
It is not necessary to delete unnecessary rows and columns and it is a good practice
to keep all metadata intact.
Deleting unnecessary rows and columns will reduce a dataset size and its good
practice to load only necessary data into your data model.
Deleting unnecessary rows and columns can damage the structure of the data model.
3.
Is it possible to create a relationship between two columns if they are different DATA
TYPE columns?
No, both columns in a relationship must be sharing the same DATA TYPE.
Summary
Completed100 XP
3 minutes
In this module's scenario, one of your organization's Power BI Desktop data models
was inefficient and causing problems. Users were dissatisfied with the report
performance, and the model's file size was too large, so it was putting a strain on the
organization's resources.
You were asked to review the data model to identify the cause of the performance
issues and make changes to optimize performance and reduce the size of the model.
Power BI Desktop provides a range of tools and features for you to analyze and
optimize the performance of its data models. You started the optimization process
by using Performance analyzer and other tools to review the performance of
measures, relationships, and visuals, and then made improvements based on the
analysis results. Next, you used variables to write less complex and more efficient
calculations. You then took a closer look at the column distribution and reduced the
cardinality of your relationships. At that stage, the data model was more optimized.
You considered how the situation would be different if your organization used a
DirectQuery model, and then you identified how to optimize performance from
Power BI Desktop and the source database. Finally, you used aggregations to
significantly reduce the size of the data model.
If Power BI Desktop did not give you the opportunity to optimize inefficient data
models, you would have to spend a lot of time in your multiple data sources to
improve the data there. In particular, without Performance Analyzer, you wouldn't
have identified the reasons for the performance issues in your reports and the
bottlenecks in the queries that need to be cleared. As a result, users would be
frustrated and unmotivated and might avoid using the reports.
Now that you have optimized the report, users can access the data that they need in
a faster time, so they are more productive and have greater job satisfaction. The
reduction that you made to the model's file size will ease the strain on resources,
bringing about a range of benefits to your organization. You have successfully
accomplished the task you were given.