Data Transformation

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 27

Data Transformation

1. Overview of Power Query Editor


Power BI Desktop has three views:
● Report view - where you use queries you create to build compelling
visualizations, arranged as you want them to appear, and with
multiple pages. that you can share with others.
● Data view - seed the data in your report in data model format,
where you can add measures, create new columns, and manage
relationships.
● Model view - get a graphical representation of the relationships that
have been established in your data model, and manage or modify
them as needed.
2. Query Editor interface
To get to Power Query Editor, select Transform data from the Home tab
of Power BI Desktop.

With no data connections, Power Query Editor appears as a blank pane,


ready for data.
Here’s how Power Query Editor appears once a data connection is
established:
- In the ribbon, many buttons are now active to interact with the data
in the query
- In the left pane, queries are listed and available for selection,
viewing, and shaping.
- In the center pane, data from the selected query is displayed and
available for shaping.
- The Query Settings pane appears, listing the query’s properties and
applied steps.

The query ribbon


The ribbon in Power Query Editor consists of four tabs: Home,
Transform, Add Column, View, Tools, and Help.
The Home tab contains the common query tasks.

To connect to data and begin the query building process, select New
Source. A menu appears, providing the most common data sources

The Transform tab provides access to common data transformation


tasks, such as:
- Adding or removing columns
- Changing data types
- Splitting columns
- Others data-driven tasks
The Add Column tab provides additional tasks associated with
adding a column, formatting column data, and adding custom columns.
The following image shows the Add Column tab.

The View tab on the ribbon is used to toggle whether certain panes
or windows are displayed. It's also used to display the Advanced Editor.
The following image shows the View tab.
The left (Queries) pane
The left pane, or Queries pane, displays the number of active queries
and the name of the query. When you select a query from the left pane, its
data is displayed in the center pane, where you can shape and transform
the data to meet your needs.

The center (Data) pane


In the center pane, or Data pane, data from the selected query is
displayed. This pane is where much of the work of the Query view is
accomplished.
The right (Query Settings) pane
The right pane, or Query Settings pane, is where all steps associated
with a query are displayed. For example, in the following image, the
Applied Steps section of the Query Settings pane reflects the fact that we
just changed the type of the Overall score column.
In the Query Settings pane, you can rename steps, delete steps, or
reorder the steps as you see fit. To do so, right-click the step in the
Applied Steps section, and choose from the menu that appears. All query
steps are carried out in the order they appear in the Applied Steps pane.
Advanced Editor
The Advanced Editor lets you see the code that Power Query Editor
is creating with each step. It also lets you create your own shaping code.
To launch the advanced editor, select View from the ribbon, then select
Advanced Editor. A window appears, showing the existing query code.

Saving your work


When your query is where you want it, select Close & Apply from
Power Query Editor's File menu. This action applies the changes and
closes the editor.

As progress is made, Power BI Desktop provides a dialog to display


its status.
To save your work, select File > Save (or File > Save As), as shown in
the following image.

3. Applied Steps
Any transformations to your data will show in the Applied Steps list.
For instance, if you change the first column name, it will display in the
Applied Steps list as Renamed Columns.

Access the Applied Steps list


Select the View tab from the ribbon, and then select Query Settings.
The Query Settings menu will open to the right with the Applied Steps
list.
Edit Applied Step
In the Query Settings menu, you can:
- Rename step
- Delete step
- Delete until end
- Insert step after
- Move step
- Extract the previous steps into query
- Edit step names and their descriptions
In the window, you can change the step name and description and
save the changes by selecting OK.

4. Merge & Append tables


Merge tables
In this case, we want to merge the queries. To do so, follow these steps:
- From the left pane of Power Query Editor, select the query into
which you want the other query to merge. In this case, it's
RetirementStats.
- Select Merge Queries > Merge Queries from the Home tab on the
ribbon.

- When you select the correct matching columns, the OK button is


enabled.

- Select OK.
- To expand the merged table, and select which columns to include,
select the expand icon ( ).
The Expand window appears.

- To apply your changes and close Power Query Editor, select Close &
Apply from the Home ribbon tab.
Append tables
Append means results of two (or more) queries (which are tables
themselves) will be combined into one query in this way:
- Rows will be appended after each other.
- Columns will be the same number of columns for each query.
Let’s follow these step:
- To append these queries, Click on one of them and select Append
Queries from the Combine section of Home tab in Query Editor.
- If you want to keep the existing query result as it is and create a new
query with the appended result choose Append Queries as New,
otherwise just select Append Queries.

- You can choose what is the primary table (normally this is the query
that you have selected before clicking on Append Queries), and the
table to append.

You can also choose to append Three or more tables and add tables
to the list as you wish.
5. Sort & Filter
Sort one column by another column in Power BI
- Select the column that you want to sort. In the Fields pane, the
Column tools tab becomes active.

- Select Sort by Column, then select the field you want to sort the
other field by

- The visual automatically sorts in the order we want:


Format filters in Power BI reports.
There are four standard types of filters that you create in the Filters pane:
- Visual filter applies to a single visual on a report page. You see
visual level filters when you select a visual on the report canvas.
Even if you can't edit a report, you can select a visual and filter it.
- Page filter applies to all the visuals on the report page.
- Report filter applies to all pages in the report.
- Drillthrough filter With drillthrough, you create a destination
report page that focuses on a specific entity, such as a supplier.
From the other report pages, users can right-click a data point for
that entity and drill through to the focused page.
Filter with a field that’s not in the visual:
- In Power BI Desktop, select the Report icon.

- Open the Visualizations, Filters, and Fields panes, if they're not


already open.
- Select a visual to make it active. In this case, it's the scatter chart on
the Overview page. All the fields in the visual are in the
Visualizations pane. They're also listed in the Filters pane, under
the Filters on this visual heading.

- From the Fields pane, select the field you want to add as a new
visual-level filter, and drag it into the Filters on this visual area.
Add a filter to an entire page
- In Power BI Desktop, open the Retail Analysis report.
- Select the Report icon.

- Open the Visualizations, Filters, and Fields panes, if they're not


already open.
- From the Fields pane, select the field you want to add as a new
page-level filter, and drag it to Add data fields here in the Filters on
this page area.
- Select the values you want to filter and set either Basic or Advanced
filtering controls.
Add a report-level filter to filter an entire report
- In Power BI Desktop, select the Report icon

- Open the Visualizations and Filters pane and the Fields pane, if
they're not already open.
- From the Fields pane, select the field you want to add as a new
report-level filter.
- Select the values you want to filter.
- Select the back arrow to return to the previous report page.
Set up the drillthrough target page
- To set up drillthrough, create a target report page that has the
visuals you want for the type of entity that you're going to provide
drillthrough.
- Then, on that drillthrough target page, in the Build visual section of
the Visualizations pane, drag the field for which you want to enable
drillthrough into the Drill through well.

6. Remove & Keep Rows & Columns


Remove & Keep Rows
Remove rows
- Open Query.
- Select a column by clicking the column header. To select more than
one column press Shift + Click or Ctrl + Click on each subsequent
column. The columns can be contiguous or discontinuous.
- Select Home > Remove Rows.
Keep duplicate rows
- Open Query.
- Select a column by clicking the column header. To select more than
one column contiguously or discontinuously, press Shift+Click or
CTRL+Click on each subsequent column.
- Select Home > Keep Rows.
Remove Column
- Open Query.
- Do one or more of the following:
+ To remove a single column, select the column you want to
remove, and then select Home > Remove Columns > Remove
Columns.
+ To remove several columns, select the columns by using Ctrl +
Click or Shift + Click. The columns can be contiguous or
discontiguous.
+ To remove all columns except the selected column, select one
or more columns, and then select Remove Other Columns.
7. Split, Concatenate & Extract
Split column
Split Column by Delimiter
- Select the column you want to split. Ensure that it is a text data
type.
- Select Home > Split Column > By Delimiter. The Split a column by
delimiter dialog box appears.
- In the Select or enter a delimiter drop-down, select Colon, Comma,
Equals Sign, Semicolon, Space, Tab, or Custom. You can also select
Custom to specify any character delimiter.
- Select a Split at option:
+ Left-most delimiter If there are several delimiters, the first
split column is based on the delimiter farthest to the left and
the second split column is based on the rest of the characters
on its right.
+ Right-most delimiter If there are several delimiters, the
second split column is based on the delimiter farthest to the
right and the first split column is based on the rest of the
characters on its left.
+ Each occurrence of the delimiter If there are several
delimiters, split each column by the delimiter.
- Select Show advanced options, and then enter the number of
columns or rows to split into.
- If you choose Custom in Select or enter a delimiter drop-down list,
you may need to enter an alternative quote character or a special
character.
- Select OK.
Split a column by number of characters:
- Select the column you want to split. Ensure the column is a text data
type.
- Select Home > Split Column > By Number of Characters. The Split
a column by Number of Characters dialog box appears.
- In the Number of characters textbox, enter the number of
characters used to split the text column.
- Select a Split option:
+ Once, as far left as possible The first split column is based
on the number of characters counting from the left and the
second split column is based on the rest of the characters on
the right.
+ Once, as far right as possible The second split column is
based on the number of characters counting from the right
and the first split column is based on the rest of the
characters on the left.
+ Repeatedly If the column has many characters, split into
many columns based on the number of characters.
- Select Show advanced options, and then enter the number of
columns to split into.
- Select OK.
Split a column by positions
- Select the column you want to split. Ensure the column is a text data
type.
- Select Home> Split Column > By Positions. The Split Column by
Positions dialog box appears.
- In the Positions box, enter the position numbers to split the text
column.
- Optionally, select Advanced options, and then select one of the
following
+ Columns Each column length is based on the positions you
specified. This is the default action.
+ Rows Instead of a new column, a new row is added based on
the positions you specified.
- Select OK.
Split a column by letter case combinations
You can split a column based on case letter combinations,
lowercase to uppercase or uppercase to lowercase:
Lowercase to uppercase For every instance of two
consecutive letters, the first being lowercase and the second being
uppercase, split the original column so that the second split column
begins with the uppercase letter. For example:
Uppercase to lowercase For every instance of two
consecutive letters the first being uppercase and the second being
lowercase, split the original column so that the second split column
begins with the lowercase letter. For example:

- Select the column you want to split. Ensure that it is a text


data type.
- Do one of the following:
+ Select Home > Split Column > By Lowercase to
Uppercase.
+ Select Home > Split Column > By Uppercase to
Lowercase.
Split a column by digit and non-digit combinations
You want to split a column based on digit and non-digit
combinations, digit to non-digit or non-digit to digit.
Digit to non-digit For every instance of two consecutive
characters, the first being a digit and the second being a non-digit, split
the original column so that the second split column begins with the non-
digit character. For example:

Non-digit to digit For every instance of two consecutive


characters, the first being non-digit and the second being a digit, split the
original column so that the second split column begins with the digit
character. For example:

- Select the column you want to split. Ensure that it is a text


data type.
- Do one of the following:
+ Select Home > Split Column > By Lowercase to
Uppercase.
+ Select Home > Split Column > By Uppercase to
Lowercase.
8. Pivot & Unpivot column
Pivot column
In Power Query, you can create a table that contains an aggregate
value for each unique value in a column. Power Query groups each unique
value, does an aggregate calculation for each value, and pivots the column
into a new table.

To pivot a column
- Select the column that you want to pivot.
- On the Transform tab in the Any column group, select Pivot column.

- By default, Power Query will try to do a sum as the aggregation, but


you can select the Advanced option to see other available
aggregations.
The available options are:
+ Don’t aggregate
+ Count (all)
+ Count (not blank)
+ Minimum
+ Maximum
+ Median
+ Sum
+ Average
Unpivot Column
You might want to unpivot data, sometimes called flattening the
data, to put it in a matrix format so that all similar values are in one
column. This is necessary, for example, to create a chart or a report.
- Select one or more columns. To select more than one column
contiguously or discontinuous, press Shift+Click or
CTRL+Click on each subsequent column.
- Select Transform > Unpivot Columns.
9. Group by/Aggregate
You’ll be using the following sample table.

Where to find the Group by button


You can find the Group by button in three places:
- On the Home tab, in the Transform group.

- On the Transform tab, in the Table group

- On the shortcut menu when you right-click to select columns


Use an aggregate function to group by one or more columns
In this example, your goal is to summarize the total units sold at the
country and sales channel level. You'll use the Country and Sales Channel
columns to perform the group by operation.
- Select Group by on the Home tab.
- Select the Advanced option, so you can select multiple columns to
group by.
- Select the Country column.
- Select Add grouping.
- Select the Sales Channel column.
- In New column name, enter Total units, in Column, select Units.
- Select OK

This operation gives you the following table.

Operations available
Perform an operation to group by one or more columns
Starting from the original sample, in this example you'll have a
column containing the total units and two other columns that give you the
name and units sold for the top-performing product, summarized at the
country and sales channel level.

- Use the following columns as Group by columns:


+ Country
+ Sales Channel
- Create two new columns by doing the following:
+ Aggregate the Units column by using the Sum operation.
Name this column Total units.
+ Add a new Products column by using the All rows operation.

You can select the white space inside the cell to see a preview of the
contents of the table at the bottom of the dialog box.

You might also like