BI practical
BI practical
Theory: Power BI is a collection of software services, apps, and connectors that work together to
turn your unrelated sources of data into coherent, visually immersive, and interactive insights. Your
data might be an Excel spreadsheet, or a collection of cloud-based and on-premises hybrid data
warehouses. Power BI lets you easily Connect to your data sources, visualize and discover what's
important, and share that with anyone or everyone you want.
The desktop application thrives on interactive and dynamic visualizations, embracing principles of
effective data communication. Its utilization of common data visualization theories, including
Edward Tufte's principles of data-ink ratio and minimizing chartjunk, ensures clarity and precision in
conveying information. Power BI Desktop operates on the principle of user empowerment, providing
an intuitive interface for constructing compelling dashboards and reports. The tool aligns with best
practices in information design and usability, facilitating a smooth user experience. Overall, Power BI
Desktop embodies the theory of democratizing data analytics, empowering users to explore,
analyse, and communicate insights effectively, thereby fostering a data driven decision-making
culture within organizations.
Procedure: -
1. To create a workspace, in the Navigation pane (located at the left), click Workspaces, and
then click Create a Workspace.
2. When prompted to upgrade your account to Power BI Pro, click Try Free. In the Create a
Workspace pane (located at the right), in the Workspace Name box, enter a name for your
workspace. The name you enter must be unique to the tenant. We suggest you name the
workspace Sales Analysis, and that you also append your initials. For example, the name
could be Sales Analysis AB.
7. In Power BI Desktop, at the top-right corner, verify that you see your account.
Experiment 2
Aim: - Import the data from different sources and load into the target system Software: - Microsoft
Power BI
Theory: - Power BI provides several methods for loading data into the target system, offering
flexibility to cater to diverse data sources and scenarios. The primary ways include:
1. Direct Query: This method establishes a live connection to the data source, enabling real-time
data analysis without importing it into Power BI. It is suitable for large datasets where the focus is on
current, up-to-the-minute insights. However, it relies on the source system for query performance.
2. Import Data: The import option involves loading data directly into the Power BI model, which
resides in-memory. This approach is effective for smaller to moderately sized datasets, providing
faster query performance. Users can leverage Power Query Editor for data transformations during
the import process.
3. Power Query Editor: A versatile tool within Power BI, Power Query Editor enables users to
connect to various data sources, apply transformations, and shape the data before loading it into the
model. This method allows for comprehensive data cleaning, filtering, and structuring.
4. Dataflow: Dataflow facilitate a Power Query experience in the cloud, where data transformations
are applied in a Power BI service workspace. This provides a reusable and centralized approach for
data preparation, promoting consistency in data processing across reports and dashboards. 5. Power
Automate Integration: Power BI can integrate with Power Automate (formerly known as Microsoft
Flow) to automate data loading processes. This integration enables users to trigger data refreshes
based on predefined schedules. events, or triggers from external systems.
Procedure: -
1. In the Power Query Editor window, in the Queries pane, select the DimEnmployee query.
2. To rename the query, in the Query Settings pane (located at the right), in the Name box, replace
the text with Salesperson, and then press Enter. The query name will determine the model table
name. It's recommended to define concise, yet friendly, names.
3. In the Queries pane, verify that the query name has updated.
4. To locate a specific column, on the Home ribbon tab, from inside the Manage Columns group. click
the Choose Column down-arrow and then select Go to Column.
5. In the Go to Column window, to order the list by column name, click the AZ sort button, and then
select Name.
7. To filter the query, in the SalesPersonFlag column header, click the down-arrow, and then uncheck
FALSE.
8. Click OK
9. In the Query Settings pane, in the Applied Steps list, notice the addition of the Filtered Rows steps.
10. To remove columns. on the Hone ribbon tab, from inside the Manage Columns group, click the
Choose Columns
11. In the Choose Columns window. to uncheck all columns, uncheck the (Select All Columns) item.
a. EmployeeKey
b. Employee NationallDAlternateKey
c. FirstName o LastName
d. Title
e. Email Address
13. Click OK.
14. In the Applied Steps list. notice the addition of another query step.
15. To create a single name column. first select the FirstName column header.
16. While pressing the Ctrl key, select the LastName column.
17. Right-click either of the select column headers, and then in the context menu, select Merge
Columns.
18. In the Merge Columns window. in the Separator dropdown list, select Space.
19. In the New Column Name box, replace the text with Salesperson.
22. Replace the text with EmployeelD, and then press Enter
23. Use the previous steps to rename the EmailAddress column to UPN.
24. At the bottom-left. in the status bar. verify that the query has five columns and 18 rows
Experiment – 3
Theory - Each loading method in Power Bl aligns with theoretical principles of data integration and
ETL, allowing users to choose the most suitable approach based on their data volume, performance
requirements, and real-time analysis needs. In Power Bl, the Extract, Transform, Load (ETL) process
is a theoretical framework guiding the construction of a database within the PowerBI model.
1 Extract: The process begins with data extraction from various sources, such as databases,
spreadsheets, or cloud services. Power Bl supports a wide range of connectors, adhering to
the principle of extracting diverse data types. The extraction aligns with the theoretical
foundation of data connectivity, ensuring compatibility with various source systems.
2. Transform: Power BI's Power Query Editor plays a pivotal role in the transformation
phase. Rooted in theoretical principles of relational algebra, it allows users to shape and
clean the data. The transformations include filtering, merging. and aggregating data,
ensuring data quality and relevance. The M language, based on functional programming
theory, facilitates advanced transformations.
3 Load: The transformed data is loaded into the Power BI model, which follows principles of
relational database theory. The tabular data model organizes data into tables, leveraging
relationships between them. This optimized structure adheres to normalization principles,
ensuring efficient storage, retrieval, and query performance within the model.
Procedure: -
1. Open Power Bl and Connect to the Data Source
2. Enter the connection settings for the data source. You may be asked for a user ID and password
upon clicking OK.
3. Select the Tables You Need
4. Transform the Data Based on Your Requirements
a. CREATE A NEW QUERY BY REFERENCING AN EXISTING TABLE
b. GROUP THE DATA WITH AN AGGREGATION
i. Click Group By
ii. Then, click Advanced. We need to group using 2 columns.
iii. Click Add grouping. And when a dropdown list appears, select appropriate
column (for example: -product id)
iv. Then, define the aggregation. Enter Total Premium in New column name.
Then, select the Operation. And last, select the premium column to sum.
v. Finally, click OK to create the grouping.
c. Create model relationships
i. In Power BI Desktop, at the left, click the Model view icon.
ii. If you do not see all seven tables, scroll horizontally to the right, and then
drag and arrange the tables more closely together so they can all be seen at
the same time.
iii. To return to Report view, at the left, click the Report view icon.
iv. To view all table fields, in the Fields pane, right-click an empty area, and
then select Expand All.
v. To create a table visual, in the Fields pane, from inside the Product table,
check the Category field.
vi. To add a column to the table. in the Fields pane. check the Sales | Sales
field.
vii. Notice that the table visual lists four product categories, and that the sales
value is the same for each, and the same for the total.
viii. On the Modelling ribbon
tab. from inside the
Relationships group, click
Manage Relationships.
ix. In the Manage
Relationships window. notice that no relationships are yet defined. x. To
create a relationship. click New.
x. In the Create Relationship window. in the first dropdown list, select the
Product table.
xi. Select other appropriate settings like cardinality, filter, etc.
xii. Click OK.
xiii. In the Manage Relationships window, notice that the new relationship is
listed, and then click Close.
xiv. In the report. notice that the table visual has updated to display different
values for each product category.
xv. Filters applied to the Product table now propagate to the Sales table.
xvi. Switch to Model view. and then notice there is now a connector between
the two tables.
xvii.
xviii. To create a new relationship, from the Reseller table. drag the Resellerkey
column on to the ResellerKey column of the Sales table.
xix. Create other relationships as required.
xx. Save the Power BI Desktop files.
d. Create Hierarchy
i. In Model view, in the Fields pane, if necessary, expand the Product table
ii. To create a hierarchy, in the Fields pane, right-click the Category column, and
then select Create Hierarchy.
iii. In the Properties pane (to the left of the Fields pane), in the Name box, replace
the text with Products.
iv. To add the second level to the hierarchy, in the Hierarchy dropdown list, select
Subcategory.
v. To add the third level to the hierarchy, in the Hierarchy dropdown list, select
Product.
vi. To complete the hierarchy design, click Apply Level Changes.
vii. In the Fields pane, notice the Products hierarchy.
viii. To reveal the hierarchy levels, expand the Products hierarchy.
ix. To organize columns into a display folder, in the Fields pane, first select the
Background Colour Format column.
x. While pressing the Ctrl key, select the Font Colour Format.
xi. In the Properties pane, in the Display Folder box, enter Formatting.
xii. In the Fields pane, notice that the two columns are now inside a folder.
Experiment – 4
Theory: - Data visualization within the context of Business Intelligence (BI) is intricately tied to the
Extract, Transform, Load (ETL) contributing to the effective communication of insights. Theoretical
foundations guide this process.
1. Data Extraction (Extract): In the initial phase of ETL, data is sourced from diverse systems.
Theoretical principles of data compatibility and integration inform decisions on extracting
relevant information from structured and unstructured sources, ensuring data consistency.
2. Data Transformation (Transform): The transformation phase involves shaping and
structuring the data. Theoretical aspects of data cleaning, normalization, and enrichment are
applied, aligning with principles from relational database theory. Transformation also
encompasses the preparation of data for optimal visual representation.
3. Data Loading (Load): Loaded data is organized within BI systems, often adopting principles
from data modelling theories. A well-structured data model facilitates efficient querying and
supports the creation of meaningful visualizations.
4. Visualization Design: Theoretical concepts from visual perception and cognitive psychology
guide the design of data visualizations. Principles like Edward Tufte's data-ink ratio, colour
theory, and Gestalt principles are employed to create visually appealing and informative
dashboards and reports.
5. Interactivity and User Experience: Theoretical underpinnings of human-computer interaction
influence the incorporation of interactive elements. Users can explore data dynamically,
allowing for a more intuitive and engaging experience, aligning with usability and user
experience design theories.
Procedure: -
In Power BI Desktop, open the Retail Analysis Sample PBIX file in report view. In the Power BI
service, open the Retail Analysis Sample PBIX file and select Edit.
There are various methods available in Power BÊ for visualization, these include:
1. Area Chart
a. Select to “+” add a new page.
b. From the Fields pane, select Sales > Last Year Sales, and This Year Sales > Value.
c. Convert the chart to a basic area chart by selecting the Area chart icon from the
Visualizations pane.
d. Select Time > FiscalMonth to add it to the Axis well.
e. To display the chart by month, select the ellipses (top right corner of the visual)
and choose Sort by > FiscalMonth. To change the sort order, select the ellipses again
and select either Sort ascending or Sort descending.
2. Doughnut chart
a. On the Visualizations pane, select the icon for doughnut chart to convert your
bar chart to a doughnut chart. If Last Year Sales isn't in the Values section of the
Visualizations pane, drag it there.
b. Select Item > Category to add it to the Legend area of the Visualizations pane.
3. Charts
a. On the Data pane, expand Store and select the Open Store Count checkbox. By
default, Power BI creates a clustered column chart with the single data value. You
can convert the chart to a card visualization.
4. Combo chart
a. From the Fields pane, select Sales > This Year Sales > Value
b. Select Sales > Gross Margin This Year and drag it to the Y-axis well
c. Select Time > FiscalMonth and drag it to the X-axis well
d. The visualization will be similar to this one
e. Select the ellipsis again and choose Sort axis > Sort ascending.
f. Convert the column chart to a combo chart. There are two combo charts available:
Line and stacked column and Line and clustered column. With the column chart
selected, from the Visualizations pane select the Line and clustered column chart
g. In the upper-right corner of the visual, select the More options ellipsis (...) and select
Sort axis > FiscalMonth
h. From the Fields pane, drag Sales > Last Year Sales to the Line y-axis bucket
i. Combo chart should look something like this
Experiment - 5
Theory:
In Business Intelligence (BI), classification algorithms are employed to analyse and categorize data,
contributing to informed decision-making. The theoretical foundation lies in machine learning,
particularly supervised learning, where algorithms learn patterns from labelled historical data to
predict the class or category of new, unseen data.
1. Training Phase: The classification algorithm is initially trained on a labelled dataset, learning
relationships between input features and predefined classes. This phase aligns with statistical and
probabilistic theories, aiming to identify patterns and correlations within the data.
2. Feature Selection and Engineering: Theoretical principles of feature selection and engineering are
applied to optimize the model's ability to discriminate between classes. Relevant features are
chosen based on their significance, aligning with information theory and data dimensionality
reduction principles.
3. Model Construction: The algorithm constructs a predictive model, often based on theoretical
frameworks such as decision trees, support vector machines, or neural networks. These models
encapsulate the learned relationships and form the basis for classifying new data.
4. Evaluation and Validation: Theoretical concepts from statistics guide the evaluation and validation
of the classification model. Metrics like precision, recall, and accuracy are employed to assess the
model's performance and generalization to new, unseen data.
5. Deployment in BI: The trained and validated classification model is integrated into BI systems,
where it classifies and categorizes incoming data. This theoretical application of machine learning
enhances BI capabilities, providing insights into customer segmentation, fraud detection, risk
assessment, and other classification-based business scenarios.
Procedure:
import pandas as pd
import matplotlib. pyplot as plt
# Define the rainfall data as a Python list.
rainfall = [799, 1174.8, 8 65. 1, 1334 .6, 635.4, 918.5, 685.5, 998. 6, 784.2, 985, 882.8 1071]
#Create a pandas Data Frame with a Date Time Index to represent the time series data.
# Start from January 2012 with monthly frequency.
start date = "2012-01-01!"
end date = "2012-12-01"
date range = pd.date range (start=start_date, end-end date, freq='M')
rainfall_df = pd. Data Frame (rain fall, index=date range, columns= ["Rain fall"]
# Print the time series data.
print (rainfall_df)
# Create a plot of the time series.
plt.figure (figsize= (8, 4))
# Define the plot figure size (optional).
plt.plot (rainfall_df. index, rainfall_df[ 'Rainfall'], marker='o', linestyle=!-) plt.xlabel (' Date')
plt.ylabel ('Rain fall')
plt.title ('Monthly Rainfall Time Series' )
plt.grid (True)
Save the plot to a file.
plt.savefig ("rain fall. png”)
# Show the plot (optional)
plt.show ()
Output:
When we execute the above code, it produces the following result and chart.
Experiment – 6
Theory:
A Decision Tree is a supervised machine learning algorithm used for both classification and
regression tasks. It's a graphical representation of a decision-making process and is particularly
intuitive and interpretable. Key Features of Decision Trees can be explained as follows:
1. Tree Structure: A Decision Tree consists of nodes, branches, and leaves. Each node represents a
decision or test on an attribute, each branch represents an outcome of that decision, and each leaf
represents a class label (in classification) or a predicted value (in regression).
2. Recursive Partitioning: The process of building a Decision Tree involves recursively splitting the
dataset into subsets based on the values of the most informative attributes. The goal is to create
subsets that are as pure as possible with respect to the target variable.
3. Attribute Selection: At each node, the algorithm selects the attribute that provides the best split,
typically using criteria like Information Gain (for classification) or Mean Squared Error (for
regression).
4. Pruning: Decision Trees can be prone to overfitting, where they capture noise in the data. Pruning
is a technique used to prevent overfitting by removing branches that do not significantly improve the
tree's performance on a validation dataset.
5. Interpretability: One of the biggest advantages of Decision Trees is their interpretability. You can
easily visualize and understand the decision-making process, making them valuable for explaining
model predictions.
Procedure:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier, plot_tree
Output:
null device 1
Loading required package: methods
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo
Theory:
K-means clustering is a popular unsupervised machine learning algorithm used for partitioning a
dataset into distinct, non-overlapping groups or clusters. The algorithm aims to group similar data
points together while keeping different groups as separate as possible. It operates through an
iterative process, where centroids (representative points) of clusters are adjusted until convergence.
2. Anomaly Detection: K-means can identify outliers or anomalies in datasets, helping businesses
detect irregularities or potential fraud within transactions, network activity, or operational
processes.
4. Pattern Recognition: K-means can uncover patterns or trends in large datasets, aiding BI analysts
in understanding complex data relationships. This facilitates data-driven decision-making and
strategic planning.
5. Document Clustering: In text analytics, K-means can group documents or articles based on content
similarity, supporting content categorization, topic modelling, and information retrieval in BI
systems.
Procedure:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
for i in range(k):
plt.scatter(X[cluster_assignments == i][:, 0], X[cluster_assignments == i][:, 1],
c=colors[i], marker=markers[i], label=f'Cluster {i + 1}')
# Plot centroids
plt.scatter(cluster_centers[:, 0], cluster_centers[:, 1], c='k', marker='x', s=100, label='Centroids')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.title('K-means Clustering (k=3)')
plt.show()