NumPy Interview Questions with Real-time Scenarios for Power BI and SQL
1. Scenario: Pre-aggregating data before Power BI
Q: You have a large numeric dataset stored in a .npy NumPy array format. How would you calculate the average
revenue per region?
A:
import numpy as np
data = np.load('revenue_data.npy') # Shape: (10000, 2) -> [RegionID, Revenue]
region_ids = np.unique(data[:, 0])
avg_revenue = {region: data[data[:, 0] == region][:, 1].mean() for region in region_ids}
2. Scenario: Replacing Excel calculations with NumPy
Q: How can you calculate standard deviation for sales in NumPy?
A:
sales = np.array([100, 200, 150, 175])
std_dev = np.std(sales)
3. Scenario: Detecting Outliers Before Power BI Import
Q: How would you detect outliers in a SalesAmount array using NumPy?
A:
sales = np.array([100, 110, 105, 120, 1300])
mean = np.mean(sales)
std = np.std(sales)
outliers = sales[np.abs(sales - mean) > 2 * std]
4. Scenario: Optimizing Memory Usage
Q: How can NumPy help preprocess large CSV files faster than Pandas?
A:
data = np.genfromtxt('sales.csv', delimiter=',', skip_header=1)
filtered_data = data[data[:, 2] > 1000]
5. Scenario: Vectorized Calculations for BI Dashboards
Q: Calculate profit margin using NumPy.
A:
revenue = np.array([1000, 1200, 1100])
cost = np.array([600, 700, 800])
profit_margin = (revenue - cost) / revenue
6. Scenario: SQL Aggregate Function in NumPy
Q: Translate SQL aggregate functions to NumPy:
SELECT COUNT(*), SUM(Sales), AVG(Sales) FROM Sales WHERE Region = 'West'
A:
region = np.array(['West', 'East', 'West', 'North'])
sales = np.array([200, 150, 300, 100])
west_sales = sales[region == 'West']
count = west_sales.size
total = west_sales.sum()
average = west_sales.mean()
7. Scenario: Interfacing NumPy with Power BI via Python script
Q: Use NumPy inside Power BI's Python script editor to calculate Z-score.
A:
import pandas as pd
import numpy as np
df = dataset # 'dataset' is input from Power BI
sales = df['Sales'].to_numpy()
df['Z_Score'] = (sales - np.mean(sales)) / np.std(sales)
8. Scenario: Use NumPy for Time Series Transformation
Q: How would you implement a 1-step lag in NumPy like DAX PREVIOUSMONTH?
A:
sales = np.array([100, 120, 140, 160])
lagged_sales = np.roll(sales, 1)
lagged_sales[0] = np.nan