0% found this document useful (0 votes)
22 views30 pages

1.-Sports Analytics - PowerBI

PowerBI Semana 2. Notas 1. Ingles

Uploaded by

alex.montes.9112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views30 pages

1.-Sports Analytics - PowerBI

PowerBI Semana 2. Notas 1. Ingles

Uploaded by

alex.montes.9112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Data Modeling: Data Types, Split Columns, and

Merge Columns
https://radacad.com/choose-the-right-merge-join-type-in-power-bi

- There are dozens of different transformations we can make to our data using the features in
Power Query.
And we can't cover them all in this course.
But there are still lots of common transformations we can show you.
In particular, you'll probably use column transformations a lot to separate and merge columns for
various purposes.
For instance, imagine that we have a table with a list of companies, and each company has an
address, and each address is broken out into multiple columns.
We have a column for the street, one for the city, state, postal code and country.
Now suppose that we need to combine all those distinct fields together into one field to build each
company's complete address, so we can plot all of them together in a map visual in Power BI.
That's an example of a common reason to want to combine columns together.
In this video, we're going to have our first encounter with some dirty data that could potentially
mess up our data model.
To clean up the data, and make it usable in Power BI, we're going to need to use some column
transformations to deconstruct this data and then rebuild it so it's cleaned up.
To illustrate this, we're going to dip our toes in the world of sports analytics and use some
basketball shot data from Davidson's owned Stephen Curry, the three time NBA champion and
two time NBA Most Valuable Player, known for his phenomenal three point shooting.
This data comes to us via data. world and the Carolina Data Challenge, an annual event at the
University of North Carolina aimed at promoting interest in analytics and data science.
Let's start in a brand new report in Power BI Desktop.
As always, let's start by clicking our Get data button and then select Text/CSV.
Our first data set is the shot data itself contained in your exercise files under exercise two with the
name curryshots.csv.
Let's load this data set and take a look at it.
We have several columns here, for instance, season, location, shot value, shot distance and
outcome.
These are all important, but we don't need these first two columns that identify the player as
Steph Curry, because we already know he's the only player in this data set.
Let's remove those first two columns, as well as the game clock column.
Also, notice the full name of our outcome column is outcome one if made zero otherwise.
That's very informative because it tells us how to translate the data in the column, but it's a bit
long and awkward for a column header.
So let's rename it to just outcome.
Let's note the fact that the original column name tells us a one in this column, means Curry made
the shot, and a zero means he missed or the shot was blocked.
That's a nice clean way of representing the two possible outcomes of Curry taking a shot.
But it's not easy to interpret it in our data visualizations.
How does our audience necessarily know that one equals a make and zero equals a miss?
We need to be more explicit about what each value means.
So let's add a new column that categorizes each shot as either a make or a miss.
To do this, go to the Add Column on our ribbon and select the Conditional Column option.
A new window pops up where we can set the name for a new column.
And we have an interface that allows us to build logical expressions to evaluate our data and
calculate a new output.
We need to build an expression that says when the outcome equals one, then our output is make,
otherwise when the outcome equals zero, our output is miss.
So let's call our new columns make or miss.
And then let's go to the expression builder and select Outcome from the Column Name drop down
list.
Then make sure our operator is set to equals, then we tell Power Query which value to look for,
in this case, one.
Then in the output box, we just write the word make.
Now we need to add a second clause to our expression for misses.
So we'll click the Add Clause button, which adds a new row we'll again set the column to outcome
and our operator to equals, but this time, we'll set our value to zero and our output to miss.
There's one final box where we can assign a value to any and all other values that do not meet
these criteria we just created.
For example, we could write N/A for not applicable.
But I'm just gonna leave it blank, and click OK.
And now you can see we have our new column, which we're going to drag and drop next outcome
so we can compare them side by side.
You can see then how the conditional column transformation allows us to evaluate the data in one
or more columns and use that information to generate a new column to enhance our data set.
Finally, let's look at this date column.
Notice how it's not formatted as a date, but it's really just a series of numbers.
We can probably figure out what dates these are just by looking at them.
But Power Query does not recognize these as dates.
If you look at the little icon, to the left of the column name, it shows a 123 icon, which means
Power Query thinks the column is just a bunch of numbers.
To confirm this, we can right click on the column header to expose our column options, and then
go down to Change Type, which will show us the list of possible data types in our data model.
As you can see, Whole Number is currently selected.
But we have options for many different data types, including dates, times and durations.
If we load this column into Power BI Desktop without changing it from a whole number data type,
Power BI is just going to try to sum them instead of treating them as dates.
Let's see if we can make this work by just switching the type to Date.
And that didn't go so well did it?
It seems to think Steph Curry will still be alive and winning championships in the year 2181.
Let's go to our data model recipe and use the red X on our Change Type step to undo what we just
did, and go back to the whole numbers.
The reason you got these weird dates in this column is because they're whole numbers.
In Microsoft Excel workbooks dates are stored as sequential serial numbers.
If you type in the date January 1st 1900 into an Excel spreadsheet, then reformat it as a number,
Excel will give you the number one.
If we type the date January 1st 2020 into that same spreadsheet and format it as a number,
Excel will give us the number 43,831.
Each date from the first day of the year 1900 has its own number assigned to it, and each
subsequent date iterates up by one.
Excel works this way because it makes it easier to run calculations across dates.
This matters because when we take these whole numbers in our original data set and tell Power
Query to convert them into dates,
Power Query seems to think we're giving it Excel serial numbers for dates and the results it's giving
us our way off.
Furthermore, if we scroll down through the data we see that some numbers are five digits instead
of six digits.
Looking closely, we can see that the data is sorted chronologically.
And at row 315 of our data set, the numbers suddenly go from six digits to five digits.
It looks like this is where we transition from here 2009 to 2010.
And we can conclude at this point that when we start from the right most number, the right two
digits are the year, the middle two digits are the day, and the left one or two digits are the month.
We can confirm this is true by going into our search engine and searching for Golden State
Warriors January 2 2010.
We get search results to tell us Steph Curry and the Golden State Warriors played the Portland
Trailblazers on that date.
And sure enough, our opponent column says POR for Portland.
So we know how our dates are formatted.
This is a good example of dirty data.
Any attempt to load this data into a database or a data modeling tool like Power BI, or another
data viz software like Tableau is going to cause issues because these aren't valid dates, but with a
little bit of know how we can break down this column into each month, day and year, and then
rebuild them into a valid date.
First, using the column options menu, let's convert our date column from a whole number into
text.
This will allow us to insert the appropriate slashes to separate months, days and years.
Then, let's go to the Transform tab on our ribbon and look for the button called Split Column.
The split column transformation gives you several methods for taking an existing column of data
and breaking it out into two or more columns, whether that's by a specific delimiter, by numbers
or non-numbers, or just an arbitrary number of characters.
For this, we need to choose By Number of Characters.
A pop up window will appear that allows us to split the column into two or more columns based
on whatever number of characters we tell it to use.
We know that the four right most characters are our year and day.
This is consistent throughout the whole data set.
So we can confidently split the column this way.
So we put four in the number of characters box and then we select the split type as once as far
right as possible, to tell Power Query to start from the right side of the text.
Then we hit OK.
Now, the date column has been split into two columns, date.1 and date.2, with date.2 having our
four right most digits and date.1 having no remainder.
But look at our recipe and notice how in an attempt to be helpful
Power Query has automatically converted these columns back into numbers with a changed type
step.
We still don't want these to be numbers.
So we're going to click on the red X in our recipe to delete that step.
Next, let's select date.2, and repeat the split column operation, so we can separate days from
years.
But this time, let's use two characters.
And let's start from the far left.
Click OK, and watch as date.2 get split into two columns date.2.1 containing our day of the month
and date.2.2 containing our two digit year.
Also notice how once again, Power Query is trying to be helpful by automatically converting these
new columns into numbers with another changed type step in our recipe.
We still don't want these as numbers, so we clicked red X, to delete that step one last time.
We're almost there.
The next thing we need to do is merge these columns back together, but we need to separate
each column with a forward slash.
So we keep our months, days and years from running together like they were before.
To do this, we're going to select these three columns by shift clicking all three of them in order
from left to right, months, days, and years.
Then we're going to right click on one of the column headers and select the Merge Columns
option.
The merge columns transformation allows you to take two or more columns and combine them
into one and you can choose a character like a comma, hyphen, space, or a custom character like
an ampersand to separate those fields from each other in your new merged column.
That's exactly what we need to do here.
We need to combine month, day and year into one column separated by a forward slash.
So let's tell Power Query to use a custom separator, and then enter a forward slash in the second
box.
Finally, let's name our new column date, and click OK. Not bad.
We now have a date column that looks much more appropriate.
But look at our columns data type, and note how it's still not a date yet.
Let's right click, go to Change Type, and select Date.
Notice how the icon in our column header changes to a calendar.
And also notice how Power Query automatically adds the century to our two digit years, so 09
becomes 2009.
Now we've got perfectly valid data to work with.
Next, let's review our recipe and look at the names of these steps.
They don't really tell us a whole lot, do they?
Granted, these aren't, there aren't many steps in this recipe.
So we'll probably remember what each one does to build our data model.
But suppose you have a longer recipe and you want a better description of what each step does.
You can read that these steps and give them descriptions by right clicking on each one.
In this case, I'll click on Properties, which opens a window where I can enter a new name and
description of what this step does.
After doing this a few more times, we've got a recipe that's a bit more explanatory.
Note how it took us five additional steps in our recipe to do all that data wrangling.
There are other more straightforward ways to do what we did, but they all involve writing
relatively complex formulas in the Mquery language.
If you work with Power Query enough and start to get really good at writing formulas in M, you
could craft a formula that condenses all five of those operations we did into one.
However, since this is a beginners course, we don't cover that more advanced method.
In this video, we imported our shot data for Stephen curry, cleaned it up and introduced you to
data types as well as the split column, merge column and conditional column transformations.
We also went a little deeper on how to work with dates in your data model and some of the things
you may need to do to get your data into a proper format for Power BI.
Working with dates in Power Query, or in data modeling in general, can be a tricky exercise,
primarily because there are so many different ways of storing and interpreting dates.
Whether that's serial numbers in Excel, or two digit years, or through outside North America,
maybe you typically put the day of the month before the month itself like 26 January instead of
January 26.
All of this variation can make formatting dates correctly for Power BI a difficult process.
Fortunately, as we've shown here, Power Query gives us all the tools we need to deconstruct and
rebuild dirty data into something that is much cleaner.
In the next video, we'll keep going with Curry's basketball data and show you how you can
combine multiple data sets together.
Data Modeling: Merge Queries, Column Profiling
- In this session, I'm gonna introduce you to a useful feature in Power Query that lets you combine
two data sources into one, creating a joined table in your data model that contains data from both
original sources.
For those of you who have done a lot of work in Microsoft Excel, this will probably remind you of a
VLOOKUP formula or index match.
For anyone with SQL database experience this is equivalent to a join operation.
We're still working with our Stephen Curry NBA shot data, and you should still have Power Query
open at this point.
In the last video, we cleaned up the shot data a lot, but I don't like the fact that all we have for the
Golden State Warriors opponents are a bunch of three letter abbreviations.
If you pay a lot of attention to the NBA, maybe you already know what these abbreviations stand
for, but the people like me who don't really pay a lot of attention to sports, it might not be that
clear.
So we need a list of the team names that we can add to our data model.
We're going to add it by going to new source in the home tab of our Power Query ribbon and
selecting text or CSV.
You'll find this list in your exercise files as NBA_teamlist.CSV,
Because CSV files or plain text power BI, sometimes doesn't recognize the column headers within
the file.
So we may need to manually promote them to become column headers.
We can fix that by using the promote row headers transformation on our home tab.
So you can see in our NBA team list that the abbreviation and franchise row headers are in row
one.
And to promote the headers, I will just click the button that says use first row as headers.
Now let's return to Power Query and get to the fun part.
You'll often need to combine data sources in Power Query.
And one of the quickest ways to do that is with the merge queries transformation.
Just as its name implies this is very similar to the merge columns transformation that we used in
our previous video.
We're going to essentially combine our big CSV file of shot data with this new table of NBA teams.
Let's start by selecting Curry shots from our list of queries on the left hand side of Power Query,
and let's find our opponent column and rename it as opponent abbreviation.
Then let's go to the home tab and click the merge queries button.
A window will pop up, which lets us specify what we want to merge and how.
In this case, we only have two tables to choose from because we started from the Curry shots data
it's already visible in the top half of the window.
Directly below it is a dropdown list where we can choose our other dataset, the NBA team list.
And now you can see the two data sets on top of each other.
Now in order to merge two queries, we have to select a column that they have in common that
can be used to match records between them.
In this case, we want to click on our opponent abbreviation column in the Curry shots dataset,
then click on the abbreviation column in our NBA team list dataset.
Down below the second dataset, you can see, we have a dropdown list where we can select the
join kind.
For people without much database experience joined kinds or joined types might be unfamiliar,
but Power Query helps you by giving a short description of what each join type does.
It defaults to a left outer join, which gives you all the records from your first table.
Plus all the matching records from your second table.
This is what we want for this lesson, but take a look at the text just below our drop down list
where it says the selection matches 12,039 of 12,155 rows from the first table.
This tells us that Power Query was unable to find a matching abbreviation in our NBA team list for
116 records.
It seems we may have stumbled upon some old dirty data. We need to clean up.
So let's click okay to merge our queries and start that process.
We can see that there's a new merged query step in our recipe.
But when we scroll right to look at the new column, it looks a little weird.
Every row says table in yellow text.
This is Power Query telling us that it has created a relationship between our queries, but we need
to define what columns we want to pull in from the second table.
There's a small button on the far right of our column header that looks a little like two diverging
arrows. This is the expand option.
If we click on this, a small menu appears that gives us a list of the fields in our NBA team list query
and lets us choose between expanding while aggregating those fields into our original table.
Expanding means we'll actually bring in the original values.
Aggregating will let us bring in account of those values. We want the values themselves.
So we'll stick with the expand option and de-select abbreviation from the list of fields because our
original table already has an abbreviation and we'll turn off this little option at the bottom that
says use original column name as prefix.
We're turning off this option because it will make our column names shorter.
Finally, we'll click, okay and watch what happens.
As you can see the original column that just said, table has gone away
and has been replaced by the full names from our NBA teams list.
Again, this is very similar to a VLOOKUP formula in Excel or a joint operator in SQL.
It's actually better than a VLOOKUP in Excel because we can easily bring in multiple fields from the
secondary table as part of this process,
you can't do that with VLOOKUP. A little more cleanup is needed.
So we'll rename our new franchise column as opponent.
And then we're going to reorder the in our dataset by clicking and dragging opponent until it's
over here by the abbreviation column.
Now let's go back a step and remember that Power Query was able to match all but about a
hundred rows when we merged these two queries.
Well, what happened to those remaining rows? Let's use Power Query to find out.
And data modeling we often need to perform data profiling, which is the process of examining our
data and summarizing information about it.
We do this in order to look for anomalies like invalid values, empty rows, and other things
that might indicate dirty data.
Power Query gives us a set of very useful tools for data profiling.
So let's go to the view tab on our ribbon, then go to the data preview section and click to turn on
column quality, column distribution and column profile.
After enabling these options, we should see a bunch of new visuals on our screen.
At the top of our data preview right below each column header we should now see a little box
that visually represents the distribution of column of values in each column.
For instance, if we look at location, we'll notice that our values are limited to either home or away.
And the chart below our column header has two bars representing those two distinct values home
or away. In contrast if we look at the opponent column, you'll see a lot more variation as Golden
State has played lots of opponents over the years in this data set.
Above these charts, we'll also see the column quality feature, which scans the data and looks for
valid rows, rows that cause errors or rows that are empty because they have no data.
This tool can be great for getting a sense of what's going on with your data before you load it into
your data model.
The most noticeable thing about what we just activated though, is the column profile feature,
which takes up almost the entire bottom third of our screen now.
With this feature, we can select a column here in Power Query and the column profile will give us
greater detail, including a bar chart that tells us which values appear most in our column.
But this doesn't tell us yet what the issue was with matching our opponent abbreviations
when we merged our queries. If we look at the very bottom of our screen right below the column
profile, we'll see a bit of text on the status bar that says column profiling based on top 1000 rows.
Our career data set contains a lot of rows, but right now the column profilers basing its analysis on
a fraction of the overall data set.
If we click on this bit of text, we can see it's actually a button that exposes a list and we can change
this setting to column profiling based on entire dataset.
Now Power Query is going to scan the entire data set and not just the top 1000 rows.
And we get a much, much more holistic picture of our data, distribution and quality.
For instance, if we click on opponent, the column profile shows me that we have more records in
our dataset with the Houston Rockets as an opponent than any other team in the NBA.
Since this is a data set of shots, Steph Curry has taken over the past several years we can
reasonably infer from this column profile that Curry has attempted more shots against the Rockets
than against any other team.
Why could this be? Probably because the Golden State Warriors and the Houston Rockets have
been among the best teams in the NBA for the past several years and aside from playing regular
season games against each other, they've also played a lot of playoff matches against each other in
the post season.
So Curry has had many more opportunities to attempt shots against the Rockets.
So we haven't even started working on our report yet, and we're already getting some insights.
But notice something else in our column profile.
Notice where the column statistics box says that 116 rows are empty.
Meaning we have 116 rows where there is no opponent in our dataset.
This is the result of our merge queries transformation earlier when Power Query could find
matches for all but 116 rows.
Let's see if we can track down those empty rows and find out the source of the issue.

Part 2
- [Narrator] In our last video, we did a merge queries transformation and Power Query could find
matches for all but 116 rows.
Let's see if we can track down those empty rows and find out the source of the issue.
Let's go to our opponent column and click on the menu, which gives us options for how to filter
the column.
What we see here is an incomplete list of opponents in our column because Power Query is only
previewing the data for us.
You can see this by the part down at the bottom. It says list may be incomplete.
To show all the possible opponents we need to click the load more button at the bottom of our
menu, and Power Query will bring in the rest of the opponents.
And the first one we can see is a no value, meaning the row is empty.
If we turn off, select all and then click the no value and click OK,
we can see that there is an opponent abbreviation of CHO, in our data set that was not matched
when we merged our queries.
So let's grab one of these dates and do a quick Google search of Golden State Warriors number of
November 15, 2014.
And we find out that they played the Charlotte Hornets that day.
Let's go back to the filter for opponents and turn on both the empty rows and Charlotte Hornets.
We can now see that whoever supplied this data set mistakenly coded the Hornets with
abbreviations of CHA and CHO.
Again, we found some messy data, it happens.
Fortunately, Power Query is here to the rescue once more, and we have the ability to clean it up.
Before we go any further remember that our filtering on this column has added a step named
filtered rows to our recipe.
So let's go over there and delete that step by clicking on the red X so we can undo that filter.
Next, let's use our recipe to take a trip back in time to just before we merged our two queries
together.
Click on the renamed columns step right before merged queries.
And you can see that we're now viewing the data as it existed before we combined our two tables.
We're going to replace the incorrect CHO values with CHA. Right click on the opponent
abbreviation column, and select replace values from the menu.
The first thing that's going to happen is that Power Query is going to display a window asking you
if you really want to insert a step in the middle of your recipe, and warning you that doing this
could potentially break future steps in your query.
This is a crucial warning.
While you can use this technique to insert steps into the middle of your recipe, you should always
be careful when doing so because it can have some big repercussions for your data model.
For example, renaming a column. If later steps in your query expect that column to have one
name, and you change it to something else, it will break your recipe, and you'll need to spend
some time fixing it.
However, for our exercise, what we're going to do here is just replace some row values, and that's
fairly harmless.
So let's click the insert button and continue on. At this point, another window will appear
with a set of options.
The replace values transformation lets you find one value within a column and replace it with
another value.
We're going to find CHO values and replace them with CHA. This could be just a section of the text,
or you can use the advanced options to only replace values when it matches on the entire
contents of a cell.
Let's match on the entire cell contents just to be safe, and click OK.
Now that we've done this, notice a new step has been inserted between renamed columns in
merged queries.
Let's click on the final step in our recipe to look at the final state of our data.
And look at the column profile for the opponent column now. Notice how our number of empty
rows went from 116 to zero, On replacing those bad CHO values with CHA, Power Query was
finally able to find matches for those 116 rows.
And now we have matching records for 100% of the rows in our data set.
The last thing we're going to do, is go to our queries pane and right click on the NBA team list
query to expose a little menu of query options.
You can use this menu to duplicate, delete or rename queries as well as reorder them and group
them together in folders, which is especially useful if you have a very large complex data model.
What I want us to do here though, is to turn off the Enable load option.
You'll see that the name of our NBA team list query becomes italicized.
Simply put, turning off the enable load option keeps this particular query from being loaded
into our Power BI data model.
Why would we want to do this? Well, because we don't need it anymore.
As a result of merging our two queries together like we just did, the team names are now part
of the Curry shot dataset.
Loading the smaller table into our data model would be redundant and unnecessary. by disabling
the load of our team list data set, we can keep our data model in Power BI Desktop simple and
streamlined.

Part 3

- Before we finish this lesson, you probably notice that there were several other types
of joins that we could use in our merge query step.
Software Developers often use the example of a Venn diagram when describing different types
of table joins and what they do, where we have two circles overlapping.
For instance, the left circle would be our Steph Curry shot data.
And the right circle is our NBA team list. With the left outer join we did earlier with shading that
left circle which represents us keeping all the rows from the table.
And let's keep the right circle unshaded to represent that we only bring in matching rows from the
right table.
Another common join type is an inner join, where we keep only the rows from both data sets that
have matching records.
In this case, imagine a Venn diagram with only the middle overlapping part shaded in.
This would represent only those rows that matched in both queries.
Then remember when we first did the merge transformation and there were 116 rows that didn't
match.
If we had done an inner join instead of a left outer join, our merge transformation would have
returned all of Curry shots except for the 116, that didn't match.
As a last example, if we had done a left anti-join, imagine that the left circle on our Venn diagram
is shaded in once more, but the right circle doesn't just overlap it, it covers it up entirely.
This is a common illustration for a left anti-join, which is where we only keep the rows from our
left table that did not have a match.
It's like the opposite of the inner join example we just talked about.
Instead of returning all but the 116 unmatched rows, a left anti-join would only return those 116
unmatched rows.
For more information on how to merge queries and the different join types,
I encourage you to click the link below and read this article by Reza Rad, a highly influential Power
BI expert and teacher. Reza goes into more detail and also uses the Venn diagram example to
illustrate the different types of joints.
So, to recap, we added the data from a second CSV file to our data model and Power Query,
combined it with the data from our first CSV file using the merge queries transformation,
explored how to use the column profile and other tools to analyze the quality
and distribution of your data, and use that as a way to track down and address issues
with the cleanliness of our data.
There was a lot of information, but we successfully tackled some of the most useful tools in the
Power Query toolbox. And now we're ready to start visualizing some of this data and analyze just
how good of a shooter Steph Curry really is. Stay tuned.

Which of the following is not a use for the Column Profiler?


You want to substitute new data in the column.

Which join type would we use in a Merge Queries transformation if we


wanted to keep only those records that exist in both the left and right
table?
Inner join

True or False: Data profiling is the process of examining our data and
summarizing information about it
True
Defining Our Measures
- Now that we've used the last few videos to prepare our data model of Stephen Curry's shooting
history for the Golden State Warriors, we can begin plotting this data in Power BI Desktop.
As always, you're welcome to follow along with me or attempt this later.
This is going to be a more complicated report, so we'll break it down into multiple videos, and
throughout, we'll introduce you to more features of Power BI.
Let's get started. We're back in Power Query, where we ended the previous video.
Before we load the data model, let's cover one last topic with data modeling.
Specifically, index columns. In Power Query, an index column is a simple column
you can add to your query that generates a different number for each row
in the data set.
It acts as a unique identifier for each record in your table.
In database design, we refer to these unique identifiers as primary keys.
Sometimes they appear as long randomized text strings, and other times they can just be whole
numbers starting at zero or one and incrementing up into the millions and beyond.
Other times, primary keys can be composite keys that are created by combining two or more fields
together, similar to how we merged columns in this data set to create our Date column.
Primary keys are a vital part of database design and data modeling.
Besides uniquely identifying each row in the table, primary keys can also be used to create
relationships with other tables, and we'll cover that topic a bit more in a later video.
Some data sets we work with may come with unique identifiers for each row already,
in which case, we maybe wouldn't need to add an index column.
For what we're doing here today, our main purpose in adding an index column to our shot data is
so we have a unique identifier attached to each row to each shot Steph Curry took.
This is going to be important when we start creating our measures in Power BI Desktop.
So to add an index column, let's go to our Add Column on tab on the ribbon, then find the Index
Column button.
There's a drop-down menu here that will let us choose from different options on how to create
our index column such as starting the number from zero, or starting from one,
which is largely a matter of preference for the person creating the data model.
I would like to start from one, so that's what we're going to select, and Power Query is
immediately going to create our new index at the far right end of our table.
I'd rather it be at the start of the table, so we're going to right-click on Index and tell it to move to
the beginning of the table.
Now it's our fist column in the table. Lastly, let's give it a friendlier name than Index.
Let's double-click on the column header and rename it as Shot ID.
Now that we've added our index, let's hit Close and Apply to save our data model
and bring everything into Power BI Desktop.
Once the data model is loaded, let's start by adding a few items to our page.
First, since we're dealing with over 12,000 shots
across nine years of data, let's do a little filtering.
We're going to add a slicer to the top left of our report, and then we're going to drag and drop our
Season column onto that slicer.
When we first drop Season onto our slicer, it will give us a list of seasons to choose from,
but I'd rather see this as a drop-down menu, so I'm going to use the slicer header to change this
into a drop-down.
And then resize the slicer so it doesn't take up as much space on the page.
Next, I'm going to add a simple table to my report page, and I'm going to add Date and Opponent
to start with.
Notice how the Date column looks. The data's been broken down into its individual components
similar to what we did in Power Query.
We can see the year, quarter, month, and day of each date. This is the date hierarchy, and Power
BI defaults to showing dates this way because it makes it very easy to aggregate values at different
periods.
For example, if we were analyzing sales data, using a date hierarchy would be a really easy way
to get a total of our company's sales for the month, quarter, and year.
A very useful feature, but not really what we want to use for this exercise.
To change the way the date is represented in this table, all we need to do is select our table, go to
the Visualizations pane, find our Date Hierarchy and click on the down arrow icon to bring up the
options.
You can see at the bottom of the menu our Date and Date Hierarchy, and Date Hierarchy is
selected.
I'm just gonna switch this to Date, and now my dates are no longer broken out into a hierarchy.
Now the next issue is our dates are in a format I don't like.
These dates are way too long, and I really don't care what day of the week the game was played
on, although maybe it would be interesting to analyze if Curry shoots better on Saturdays instead
of Sundays.
But let's stick to the plan. When we changed how the numbers in the countries data set
were formatted in our week one lessons, we just clicked on a column from the report editor and
used the Column Tools tab to change them.
But I'm now gonna show you a way, another way, to do this using the data view.
I'm going to navigate to my data view, find my Date column, and then go to the Column Tools tab
and change the format to a much shorter one like M/D/YYYY.
That will save space in the report and it still gives me the information I need.
While we're in the data view, we're also going to reconfigure two more fields.
The Left.px location and Top.px location fields give us the coordinates on the basketball court
where Curry took each shot, and they're going to give us the opportunity to create some really
cool visualizations soon.
But before we can do that, we need to change how they're formatted.
Let's click on the first one and go to the Column Tools, find the Properties section, and change the
Summarizations setting to Don't Summarize.
Then repeat this process for the second column.
Changing this setting is going to tell Power BI Desktop to not automatically aggregate these fields
any time we add them to a new visual, because these fields are not designed to be aggregated.
They're just coordinates, and summing or averaging them would not make any sense, and would
definitely mess up our visualizations.
Now I think we're ready to define some measures. Let's stay on the Data tab for this,
go to our Home tab, and click New Measure.
The first measure we want to create is our most basic measure,
Total Field Goals Attempted. This will be our most important measure because it's the root of
some many other measures.
We're going to reuse it several times, and this is where our index column comes back into play,
because my formula is going to say simply
Total Field Goals Attempted equals Distinct Count of Curry Shots Shot ID.
In this DAX formula, our shot ID, the index we created earlier, uniquely identifies each row in our
data set.
In each row is an attempted shot by Curry, so all we need to do is write a distinct count function
which looks as the Shot ID column and counts each occurrence of a number in that column.
We should get hundreds per season, adding up to over 12,000 overall for the entire data set.
But that's just the beginning. Let's use our new Total Field Goals Attempted measure
to create some more.
In our next one, we want to return all the field goals that Curry made, which will later allow us
to calculate his shooting percentage. To do this, we're going to use a little function called
Calculate, which takes a measure and filters it by a set of criteria.
In this example, we're going to take our Total Field Goals Attempted measure and filter it by shots
that Curry made where outcome equals one.
It should be obvious, but FG in this formula stands for field goals.
Calculate is one of the most important functions in DAX because of its ability to alter how
measures are calculated, and is one of the reasons why Power BI professionals emphasize the
benefit of explicit measures being reusable.
For me to get the vast majority of measures I need for this exercise, all I need to do is define my
Total Field Goals Attempted measure once, and then I just create other measures that reference
that first measure and filter it in whatever way I need.
Let's do a few more. So for Total Field Goal Percentage, I'm just going to divide the total field goals
made by the total field goals attempted, and then, in the last argument, I'm going to put comma
zero to just use this alternate result for whenever we have to divide by zero.
For three point field goals attempted, I just need to use the Calculate measure and tell it to
calculate the Total Field Goals Attempted measure, but to filter on, to filter on rows where the
shot value is equal to three.
Finally, for Three Point Field Goals Made, we can calculate it two different ways.
The first way I can calculate the Total Field Goals Made measure and just filter it by ones where
the shot value equals three. Or I can calculate it by taking the Three Point Field Goals Attempted
and say where outcome equals one.
And for our final measure, for Three Point Field Goal Percentage, we just need to divide the Three
Point Field Goals Made measure by the Three Point Field Goals Attempted measure,
and again, we're going to use the comma zero to tell DAX what to do if it ever encounters a
situation where it divides by zero.
Now let's go back to the report editor and add all of these measures we just created to our table,
starting with Total Field Goals Attempted and ending with Three Point Field Goal Percentage,
and look at the results.
At this point, we've defined six explicit measures using a distinct count of our index column as a
sort of base measure, or root measure, from which all other measures are derived.
You can see how I reused these measures over and over again to keep my formulas simple and
clean, and you can see how we used Calculate to change the context of each measure
using specific filter criteria.
This yields a table full of shooting stats for Steph Curry's 2018 season with the Warriors.
Now let's do a little formatting and organization of our measures by heading to the Model tab.
In week one, I mentioned how we can use the model view to see all the tables on our model
and how they relate to each other.
In this case, we once again only have one table, so that part isn't very interesting at the moment.
But what might be more interesting is how we can use this tab to efficiently format
and update our measures using the Fields pane on the far right.
To illustrate the possibilities here, we're going to Control click to select all six measures we
created.
And you can see there's a Properties pane here which gives us details about the group of
measures.
We can create a description for each, and change how they're formatted.
To illustrate the possibilities here, we're going to Control click to select
all six measures we created, and you can see there's a Properties pane here which gives us details
about this group of measures.
We can create a description for each, and change how they're formatted.
But for now, all we're going to do is fill in this little box called Display Folder
with the word Metrics, and hit Enter.
This is going to group all of our measures together into a little folder within our data set,
making them easy to find.
Then, we're going to Control click all the fields from our original data set, all the things that are not
measures, and put them into a display folder named Fields.
This will in turn keep them all together nice and organized.
And now let's go back to our report editor. Notice how our Fields pane is now much cleaner now,
and everything is grouped together.
Using display folders is a great way to organize your data models, especially as they get larger and
more complex.
Now that we've defined our most important measures, this is a good place to take a break.
We've added an index column to our data model, reformatted a few fields, introduced you to date
hierarchies and how to handle those, and defined the explicit measures
that are going to be the focal point of our report analyzing Steph Curry's shooting data.
In the next video, we're going to build our first visual of the shooting data, and we're going to
break down those shots by their distances from the basket.
See you in the next lesson.

Which of the following is NOT a correct reason to create an Index


Column?
To establish a date hierarchy

The date hierarchy option in Power BI by default breaks down a column


of dates into its individual components, making it easy to aggregate
values at different periods.

How did we get a count of the number of unique items in our data model?
Use the DISTINCTCOUNT function in DAX with the ShotID column
Sync Slicers and Scatter Charts
- The previous video saw us laying a strong foundation for analyzing Steph Curry's NBA shooting
data by defining the measures that will be the focus of our analysis.
In this video, we're going to visually plot Curry's shot data on the basketball court using a scatter
chart, also sometimes referred to as a scatter plot.
And we're going to use a feature called Sync slicers to reuse the same filters across all the pages in
our report.
Follow along with me, and let's start plotting our data.
We're back in Power BI Desktop, picking up from where we left off.
The first thing I want to do is create a new page in my report.
So I'm going to go to the bottom of my screen and hit the yellow plus sign button next to page one,
which will create page two, a new blank page.
Okay, we're good.
Now one objective of our report is going to be for our audience to select a particular NBA season
and navigate across multiple pages, seeing that same season the whole time.
They shouldn't have to select the season every time they move to a different page.
In the last video, we added a slicer to page one that allows users to choose the season.
But that slicer only works for page one. How can we make it work for page two as well?
Through the magic of slicer synchronization.
In Power BI Desktop, you can create a single slicer on one page, and then reuse that same slicer
across multiple pages with a few clicks of your mouse.
And those slicers will stay in sync with each other if you tell them to do so.
To illustrate how to do this, let's go to our view tab on our ribbon and look at the button on the far
right named Sync slicers.
Clicking this will open up a new pane in our report editor that we haven't seen before.
Right now it's blank, so let's navigate back to page one and select the slicer we created in our last
video.
Now when you look at the Sync slicers pane, we can see this little grid that lists
each page in our report, and we have a column with a refresh symbol, followed by a column with
an eyeball symbol.
The first column represents synchronization. Checking the boxes here will make it so that
any change to your slicer will be applied to the slicers on other pages.
If you filter the data by the 2018 season on page one, it will also filter the data on page two using
that same season.
The second column represents visibility. Checking the boxes here will control whether or not your
slicer is visible on the other pages.
Note that your slicer does not need to be visible on all pages.
With the visibility controls, you can have it sync to another page, but be invisible.
With this technique, you could have a cover page where your audience selects value from a drop
down list, and then that controls the data in all subsequent pages.
And you would need to navigate back to the cover page to change it.
That might be useful in some scenarios, but in our case, I would actually like the slicer to be visible
and synced on every page.
So I'm going to check all the boxes. Now when we go back to page two, we'll see that the slicer we
originally created on page one is visible here.
And if we change the value using our drop down list and go back and forth, we can see that the
values change across pages, so our slicer is synchronized.
Next, we're going to add an image of a basketball court to page two.
This image is contained in your exercise files as a JPG file named NBA Court.
From the Insert tab on our ribbon, we're going to click the image option, then browse to our
exercise files and select the NBA court JPG.
Now that it's on our screen, let's resize it a bit because it's going to be the main feature on this
page.
We want it to be nice and big and noticeable. Next, let's bring in our shot data as a scatter chart
by going to the visualizations pane and finding the little icon that looks like a bunch of polka dots
arranged between two axes.
Scatter charts are typically used to plot two sets of variables, so we can visually look for
relationships between them.
For example, if we wanted to analyze causes of climate change, we might build a scatter chart
where we plot global temperatures on the x axis, and on the y axis, we can put atmospheric
carbon dioxide concentration or methane concentration, and use the dots in our chart to observe
the correlation between temperatures and the amounts of CO2 or CH4 in our atmosphere.
This will drop our chart onto the page and now let's go ahead and drop in our fields.
We're going to add Make or Miss to the Legend, then we're going to add our left location to the X
Axis, and our top location to the Y Axis.
Make sure that each of these fields in your visual is set to Don't Summarize by clicking the field
options.
When we do this, we get a very elongated visual that kind of resembles a horseshoe,
which is all of Curry's shots for the 2018 season in this data set.
In geometry, this kind of system where we plot points in a space using numerical coordinates is
called a Cartesian Coordinate System.
Cartesian coordinates are a fantastic way of visualizing sports data, because most sports take place
on a bounded two dimensional playing surface like a field or a court.
Obviously, we're looking at basketball shot data in this exercise, but I've also worked on a project
in Power BI in the past, where I used these kinds of two dimensional coordinates to show players
passing or shooting the ball in a soccer match.
So now that we've added our fields to the scatter chart, we can see it looks a little messy.
Our x axis is way too skinny, which makes the whole chart appear too elongated.
We need to resize it so it better fits the size of our basketball court image.
And then we need to adjust our x and y axes and change some other things about the appearance
of our chart.
To do this, I'm going to introduce you to a critical part of visualizing data in Power BI Desktop,
which is the format settings.
All visuals have format options that you can configure in Power BI Desktop,
including fonts, colors, shapes, titles, interactions, and so forth.
And these options vary between different types of visuals. For instance, a pie chart visual obviously
won't let me format an extra y axis because pie charts don't have x and y axes.
Which format options are available just depends on what visual you're using at the time.
First, let's take a closer look at our scatter chart. You can see the x and y axis both meet in the
bottom left corner, at the number zero.
In coordinate systems, this is called the origin. From the origin, the y axis goes up to around 450,
and the x axis goes to the right until it gets to 500. And all of Curry's shots fall roughly within this
range.
This is going to be important in a few moments as we start changing the format settings.
Let's select our scatter chart, then go to the visualizations pane and locate the button that
resembles a paint roller, like you would use to paint your house.
Clicking this icon will change the format settings for our visual.
You can see over a dozen headings each of which can be expanded or collapsed to show their
settings.
We're going to change a lot of things here and this could get a little tedious, so follow along with
me carefully and I'll do my best to explain everything we're doing and why we're doing it.
First, let's look at the title. Under the title settings, you can see we can change the title itself,
the font color, the background color, the alignment, as in left, center, or right, in both the font
family and the font size.
If we wanted to alter any of these properties, this is the place to do it.
But I want us to get rid of the title on our chart because it's kind of ugly and it just takes up too
much space.
It doesn't give us any useful information. So we'll turn it off by just hitting the little toggle button in
the title heading that says Off.
Next, let's peek at the background settings. There are really only two settings, the background
color and the transparency setting.
So if we wanted, we could put some new colors behind our data, but that's really not necessary
here. In fact, we don't want there to be a background at all because the basketball court image we
imported earlier is going to be our background.
So let's also turn off the background settings. Now we get to the x axis.
There are a lot of settings here and we need to change a few, so watch closely.
First, we're going to scroll down and turn off the axis title.
It doesn't serve any purpose here, so let's get rid of it by clicking the toggle button.
Next, we need to set the start point and end point for the axis.
Why would we do this? Well, remember that charts in Power BI are interactive, and as you filter
data using slicers, or other visuals, oftentimes the scaling of the chart will change dynamically,
because more or fewer data points are becoming visible, and the chart needs to adjust the start
and end points of its axes to fit all of the data into the visual.
We don't want that to happen here because our background image is a static image of a basketball
court. If we let the x axis resize itself as we filter the data set, the points on our scatter chart will
begin to shift around and the visual may no longer make sense.
Remember, that the axes on our chart show the data ranging from around zero to about 500
on both the x and y axis.
So we're going to change the start setting of the x axis to negative 10, and the insetting to 500.
This will keep our x axis frozen in place and keep the coordinate locations consistent as we
interactively filter our data.
And then to finish, I'm going to turn off the x axis, so it's no longer visible.
Our settings are still there, and they're still in effect, but the axis itself is invisible so it doesn't
clutter our scatter chart.
Finally, let's repeat the same process for the y axis, we're going to turn off the axis title.
And then set the start point to negative 10 and the end point to 500.
And then we'll turn off the y axis so it becomes invisible.
At this point, the only things we should see in our chart are the data points themselves, and the
legend up at the top where the title used to be.
We're going to keep the legend because it gives us a nice visual cue to determine what color
means a make or a miss.
But let's tweak the colors a bit. Power BI has defaulted to a dark blue for miss
and a medium blue for make and I don't think there's enough contrast between those two colors.
Let's change these colors by going back to our format settings and find the data colors heading.
Here we can see we have two color pickers we can use, one for make and one for miss.
I actually like the blue for make because it's close to the shade of blue that the Golden State
Warriors use in their uniforms, so let's keep that the way it is.
But let's change miss to a yellow color, so we can match Golden State's other primary court color
which is yellow.
So there's a nice mustard yellow we can use that makes it easy to distinguish between makes and
misses.
And also styles are visual after Stephen Curry's NBA team.
I also don't think the dots on our chart are big enough, so let's go to the shape set and increase the
size of these dots from negative 10 to zero.
That will increase their diameter a bit and make them easier to see.
Okay, that was a lot of minor steps to take, but now we have a visual that's looking more polished.
For the final step, let's take it and drag it on top of our basketball court image, and then resize it a
little bit, to get the proportions right.
This part may take a little bit of trial and error. Notice how since we've turned off the background
of our visual, the scatter chart is 100% transparent and we can see the lines of the basketball court
clearly underneath our shot data.
I'm primarily using the arc of the three point line as a guide to get the shape right because you can
see pretty clearly where Curry's three point shots cluster together and also where his layups under
the basket cluster together.
We just need to align those clusters to the shapes on our court, and then we're done. Okay.
Quick note, if your scatter chart ends up behind your basketball court, you need to move it to the
front. Objects on a Power BI page are organized in layers from the back of the page to the front,
and whatever objects get added first are the backmost, while objects that were added later are
the frontmost.
You may have added the scatter plot before you imported the NBA court JPG.
To change whether your chart appears in front of or behind the image.
First, select your chart, then go to the Format tab on your ribbon.
There are two buttons named Bring Forward and Bring Backward that you can use to move your
chart to the back or to the front.
Before we wrap up, let's take a moment here to navigate through the different seasons using our
slicer in the top left corner, and observe how the data changes over the years.
In the 2010 and 2011 seasons, Curry's shot attempts seem more or less evenly spread out across
the court. When we go to 2012, we see the data gets a lot more sparse.
The season that began in 2011 and ended in 2012 was unique because the NBA lockout that ran
from July to December of 2011, shortened the season by about 20 games.
Plus, unfortunately for Curry, he missed another 40 games due to ankle and foot injuries, so we
don't have much data for 2012.
Let's settle on the 2016 season. The Warriors didn't win the NBA championship that year, falling
one game short to the Cleveland Cavaliers.
But Curry still hit a huge milestone by being the first player in NBA history to be unanimously
named the league's MVP, or Most Valuable Player.
So I think 2016 is gonna be some really interesting data to examine in the next couple of videos.
Let's recap this lesson. In this video, we plotted our shot data
on top of an image of a basketball court and got everything looking neat and clean by using the
format settings for our visual.
We showed you how you can adjust colors, x and y axes, titles and backgrounds,
These are just a few of the ways you can reformat visuals in Power BI Desktop.
And we also looked at how you can unify slicers across multiple pages using the sync slicer feature.

Which of the following options represent good uses for a scatter chart?
(may be more than one)
Plotting two measures to see how they might relate to each other
Plotting spatial (x,y) coordinates

What two variables did we use in this lesson?


Shot data of top location and left location
Grouping, Card Visuals, and Conditional
Formatting
- In our last video,
we plotted Steph Curry's shot data on our court.
Let's now take the 2016 season
for the NBA Golden State Warriors
and explore Curry's shooting in more depth.
First, let's analyze Curry's shooting distances.
Shot distance is one of the columns in our data set,
measured by the number of feet Curry was from the basket
when he took each shot
and ranging anywhere from zero to 39 feet.
We can easily begin by just creating a column chart,
dragging shot distance onto the axis
and dragging our total field goals attempted measure
onto the value.
Which gives us this chart
and this opens up some cool possibilities
because then we can click on the value 25
in our column chart.
And watch as it filters our scatter chart
to just the jump shots that Curry took
25 feet away from the basket.
Click through the values
to watch the shots get closer or further away.
This is great, but I wanna group these distances
into a set of ranges.
So I can see all of the close range shots together,
like dunks and layups
and maybe compare those to his mid range
jumpers and floaters and expand the comparison
to his long range three point jumpers.
To create ranges like that,
we're going to use a feature in Power BI Desktop
called grouping.
Start by going to your fields pane and right clicking
on the shot distance field to bring up our menu of options.
Then look down until you find the option for new group
and click it.
This will bring up a window
where we can establish our new group.
First we're gonna name our new group Shot Distance.
Next, there are two group types we can choose from.
The first group type is Bins
and it divides our data into equally sized groups.
This means if I set my bin size to seven,
Power BI will take the first seven values
in our shot distance column, and group them together.
Then we'll move on to the next seven
and group those together, and the next seven, and so on.
Until there are no ungrouped values remaining.
It's quick and easy.
Alternatively, we can use a second group type called Lists.
With lists we can specify which values
should go into which groups.
For example, I can group all of the shots
from zero to four feet together for the really close ups,
and then do a range of five to 12 for medium range shots
13 to 22 for longer,
and then 23 and up for the truly long range shots.
These groups are not equally sized, like the bins were.
I want us to do a list group here
because it will allow us to be very specific
about our ranges.
The three point shot line in the NBA
is about 23 feet from the basket.
So I wanna have a grouping of three point shots
where distances are greater than or equal to 23 feet.
Then we'll group the remaining shots into other buckets.
When we select the list type, our screen changes.
So we have two columns, our ungrouped values on the left,
which consists of all the distinct values
in our short distance column,
and the group values on the right,
which is empty because we haven't created any groups yet.
To start creating groups,
all we need to do is shift click on rows in the left box,
and then click the group button to group them together.
Then, we can rename our groups,
these groups sort alphabetically,
so we're going to give them names that will ensure
they sort in the order we want to see them.
So for the first one, we'll do zero to four feet.
For the next group, we'll do five to 12 feet
For our third group, we'll do 13 to 22 feet.
And for our final group, we'll do 23 plus feet.
After we've set our groups, let's click OK
to close this window.
Immediately our new group shows up in the fields pane
with a new icon next to it.
Next, let's create a new table on page two
to the left of our shot chart, and add four columns to it.
Shot distance, total field goals attempted,
total field goals made and total field goal percentage.
Now we can compare Curry shooting stats
across these different ranges.
And we'll notice that the vast majority of the shot attempts
occur beyond the three point line.
And he's still extraordinarily effective from that range
shooting 44.6% from beyond 23 feet.
For comparison, the NBA average that season
for three point shooting was 35%.
We can then select these rows in our table
to filter the other visuals on our page
by these groups we created.
To round out this page,
let's add a couple of card visuals
above our bar chart to display.
Card visuals are barely visuals at all,
they just display a single aggregated value
alongside a text label that tells you what it is,
like if you wanted to highlight the number of days remaining
until the key deadline arrives.
Or if you wanna display the name
of the highest ranked salesperson in your organization.
Cards are simple,
but they're a great way to display a single measure or value
that is important to our analysis.
In this case, we'd like to know Curry's median shot distance
and his longest shot distance.
So let's drop two cards on our page
by finding and selecting the card visual
from our visualizations pane.
It looks like a little white box with the numbers 123 in it.
Next, let's create two explicit measures.
Let's create these in our data view.
And then we'll go to the model view
so we can group the new measures
into the metrics display folder.
The first measure we're going to create
is median shot distance.
So we're going to use the median function
and we're going to tell it to use the shot distance column.
In feet.
That's just going to calculate the median shot distance.
We'll add another measure for the longest shot distance.
So we'll call that longest
shot distance.
And this time we're going to use a DAX function called Max.
And we're gonna feed it the same field as before,
shot distance and feet.
Once we've added two blank cards,
let's drag our new measures on to each one.
I need to add the new measures
we just created to our metrics table.
So I'll go to our model view and select both measures,
and then go to the display folder property
and type in metrics.
This will add them to the metrics table.
Now we go back to our report editor.
We're ready to keep going.
Let's drag our new measures on to each of our new cards.
And we see that in the 2015, 2016 season,
Curry's longest shot was 39 feet from the basket,
and his median shot distance was 23 feet from the basket.
Considering the number of long distance shots Curry takes,
it shouldn't come as a surprise that his median is so high.
Cards are so simple,
but they can be a key part of a well designed report.
Just the size of them makes them noticeable.
But the abundance of white space around them
also draws the eye in a way
as it stands almost in contrast
to the messy polka.at appearance of our shot chart.
And that's what we want
to draw our audience's eye to those critical numbers.
Towards the end of the course,
we'll talk more about design concepts
that help you make visually appealing reports in Power BI.
The last thing I wanna do on this page
is to introduce you to Conditional Formatting.
Conditional Formatting is the type of visualization
for tables and matrices in Power BI Desktop
that allows you to customize
the color or appearance of a cell based on field values.
This could be something as simple as
changing the font color or the background color of the cell,
depending on which numbers in the column
are highest or lowest.
Or we can use data bars
to show a small graphical representation behind our numbers.
Or we can use icons as a kind of warning indicator
and set up rules that say,
when a cell goes below this value,
display a bid up big red X symbol.
Whatever we're trying to do,
Power BI gives us a way to spruce up our tables and matrices
with conditional formatting.
So let's add a few examples to our report
to give us a better understanding.
Start by selecting the table we just added,
then look at our visualizations pane
and open up the options for our total field goals
attempted measure.
There's an option in here for conditional formatting
and underneath that heading,
we have five options to choose from.
Let's start with data bars.
When we select this option,
a window appears with the type of conditional formatting
we've chosen, and the name of our measure.
And data bar simply displays a horizontal bar in our cell,
and we can tell Power BI to make negative numbers
a certain color like red or positive numbers are blue.
We can set the minimum and maximum values
for the data bars x axis
by either sticking with the default
lowest and highest values, or we can enter custom values.
We can also change the direction
of the bar between left to right and right to left.
For this, let's just make our positive bar
a lighter shade of blue and make the negative bar yellow
and change our access color to a very light gray.
Then, let's hit OK and go back to our table.
Obviously there shouldn't be any yellow
because a player can't attempt
a negative number of field goals.
Instead, we get some nice light blue bars
behind our numbers in the table,
so you can help your audience visualize it more easily.
This makes it easier to see that
Curry's number of shots from 23 plus feet,
eclipse his attempted field goals from shorter ranges.
Now let's add some conditional formatting
to our total field goals made measure.
Select a table, go to the visualizations pane
and open the conditional formatting menu.
And this time, let's select background color
or font color.
It doesn't matter which because
they both have the same formatting options.
So I'm just gonna choose font color.
We can update font colors and background colors
in three ways, a simple color scale by rules
or by field values.
First, a color scale lets you choose between two colors,
a minimum and a maximum.
Any numbers falling between those two boundaries
will have a background color that also falls
somewhere between the two colors you select
and Power BI Desktop gives you a color bar in this window
to illustrate to you how your formatting will appear.
If you choose the custom option,
you can manually fill in a value
for either your minimum or your maximum.
You can also turn on a diverging option
that lets you set a color for the middle value.
And I'll show you an example
by making the middle a bright orange
so you can see how the color scale changes.
Second, let's switch to the format by rules option.
Now we can format our table
based on simple logical conditions
like we did in power query using
the conditional column transformation,
such as when the value of this field
is greater than x percent and less than y percent,
use the color purple.
Third and finally, we can format by a fields value.
This technique is helpful if you have color names,
or color hex codes in your data.
Like if your CSV file contained a color for each NBA team,
and we wanted to shade each team's table cells
based on that color.
That's not something we'll show in this course.
But I'll have a link for you in a moment
where you can get a better idea
of how to harness that method.
For our total field goals made measure,
let's do a rules based formatting
that says when the number of total field goals made
is less than 100, shade it red.
If it's between 100 and 200, shade it yellow,
and if it's greater than 200, shade it blue.
Then let's hit OK and check out our table again.
Finally,
let's format our total field goal percentage measure.
Back to our Conditional Formatting menu
and I'm going to choose icons this time.
As I mentioned earlier,
you can use this option to add little graphical indicators
to your cell and you have two modes to choose from,
you can do it by rules like we just did,
or do it by field values, we're going to do it by rules.
And notice how you can change the layout
by choosing if the icons go to the left
or right of the data, or the top or middle of the cells,
and so forth.
You can also use this handy style drop down
to pick a set of icons,
or you can use the rule section
to manually pick different icons.
I'm gonna use these little circular icons,
but then I'm gonna customize one of them
so you can see how it works and click OK.
Now we see our table again,
but I don't like how these icons
are right on top of our total field goal made numbers.
So I'm gonna separate them a bit
by adding a vertical grid line to my table.
To do this, I'm gonna go to my format settings,
open up my grid options and turn on the vertical grid.
This will put a vertical line
between each of my columns in my table
and make them hopefully more distinct
and easier to tell apart.
I really like conditional formatting,
it's a simple way to spruce up tables,
especially when your tables have
key performance indicators in them.
And you want to efficiently highlight
the good numbers from the bad numbers,
or the numbers that have changed the most or the least,
or the numbers that have met some kind of rule or condition.
There are a lot of great possibilities.
For more on Conditional Formatting in Power BI,
check out the link below.
Instead, I'd like to start a new page
so I can introduce you to a couple more types of visuals
and talk about some important design considerations
when you're choosing charts for your reports.
So let's go to the bottom of Power BI Desktop
and create a new page.
On page three, we're going to examine Curry stats
across the seasons in our data set,
so I'm not going to use a slicer.
Instead I'm gonna go ahead and drop a column chart
in my top left corner
and I'm gonna put a line chart directly underneath it.
Line charts function a lot like column charts,
they just visualize the data a little differently.
On our column chart, we have these prominent columns
that start at the x axis
and reach all the way up
until they reach the point on the y axis
where they need to be.
Whereas the line chart
just connects these points with a line
to give a better sense of the continuity between them.
Using a line chart can also make it easier
to interpret the increase or decrease
of something over time.
On both of these charts,
I'm gonna add the season field to the axis
and then I'm going to drop our median short distance measure
into the values.
Now let's compare these two visuals
and notice how the y axis on our column chart
starts at zero feet, but the y axis in our line chart
starts at 18 feet.
As a result, the line chart makes the differences
in Curry's shot distances over these nine seasons
look very pronounced maybe too pronounced.
Perspective matters a lot when you're looking at charts
and trying to draw inferences from them.
His median shot distance
has definitely increased over the years.
But the line charts perspective is so narrow
that it exaggerates the increase.
Meanwhile, the column chart
provides a more accurate visual representation
of the increase in Curry's shot distances.
Data, visualizations can deceive us
either intentionally or unintentionally.
numbers don't lie.
But charts can visualize those numbers in a way
that skew our perspective or omit key information.
Using a similar example
to what we just visualized in Power BI.
Take a look at this chart
that was broadcast several years ago
by a major US news channel.
I won't say which one.
I recreated this chart in Excel
to keep them from coming after me.
It shows the number of US Census respondents on welfare
versus those with a full time job.
The y axis is severely truncated,
making it look like there's an enormous gap
between these two populations.
This is how this particular news channel
chose to present this data.
If we take that exact same data and plot it in Excel
with a y axis starting at zero,
we get a more accurate picture of the differences
in these two populations.
And we see that it's not so dramatic after all.
There are many examples of data visualizations
out there in our society that are deceptive.
Sometimes it's just caused by carelessness
and someone not thinking critically
about what their chart says.
That's why I wanna teach good practices in this course,
and show you these examples
so you can avoid the same mistakes.
Other times however, charts can be meant to mislead you,
to lead you to a conclusion
that supports the author's agenda.
So whenever you see a data visualization
being used in a newspaper or magazine article,
a TV news piece or elsewhere,.
Look at it carefully and rationally
and really examine what it trying to tell you
and how we might be misleading you.
That does for our examination
of Stephen Curry's basketball data.
I've given you just the briefest
most basic introduction to the world of sports analytics.
But this is a rapidly growing field
in the United States and Europe,
as teams try to collect data
and find ways to gain a competitive edge,
either on the playing field or in their recruiting.
One book I highly recommend
to anyone who is interested in sports analytics,
or just data analysis in general is
"Moneyball"
by Michael Lewis.
I don't pay much attention to sports,
but I found it to be a fascinating examination
of how organizations can combine data
with strong evidence based reasoning and deduction
to uncover trends that nobody else might be noticing
and use that to affect meaningful change in an organization
even an entire industry.

True or False: bins allow us to divide a column's values into equally-sized


groups, while lists allow us specify which values go into which groups.
True
What are card visuals good for?
Displaying a single measure or value that is important to our análisis

Check all of the scenarios for which we would use the conditional
formatting visualization in Power BI.
Changing the font color/background color of the cell depending on which
numbers in the column are highest or lowest.

Using data bars to show a small graphical representation behind our


numbers.

Using the icons as a kind of warning indicator, and set up rules that say
"when a cell goes below this value, display a big red X symbol".

https://docs.microsoft.com/en-us/power-bi/desktop-conditional-table-
formatting

You might also like