0% found this document useful (0 votes)
43 views29 pages

Interview Questions

Uploaded by

vedame3254
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views29 pages

Interview Questions

Uploaded by

vedame3254
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Data Analyst /

Business Analyst
Interview Questions
Mazher Khan - IIT (BHU) - B.Tech (DR-2)
Senior Data Analyst @TARGET | Ex - OLX (EU)
YouTube - 30M+ (Views) l LinkedIn 20k+

Book for Career Guidance,


https://topmate.io/mazher_khan
CV review & interview tip
Book 1:1 Mentorship Plan -
https://forms.gle/YTjGh4Y11DLSqpdW6
1,3, 6 Months

Follow me on Linkedin https://www.linkedin.com/in/mazher-khan/

Follow on Instagram https://www.instagram.com/khan.the.analyst

Follow on Youtube https://www.youtube.com/@khan.the.analyst

Follow Me on Nas Data


https://nas.io/khan.the.analyst
Analytics Community

Telegram Link- https://t.me/+XTjv6r80eDc5ZWU1


Interview Guide
A Blueprint for Success

Breaking into the data analytics field might seem daunting, but with the right focus, it’s absolutely
achievable. After numerous interviews and countless hours of preparation, I’ve realized that
prioritizing the right skills is key.

Here’s the roadmap that worked for me and can help you stand out:

SQL Mastery:
SQL is the bread and butter of data analytics. Focus on:
• Writing efficient queries.
• Solving real-world case studies.
• Practicing advanced topics like window functions and CTEs.

Python Basics:
Don’t aim to be a Python expert, but know how to:
• Handle data cleaning with Pandas.
• Use basic visualizations with Matplotlib or Seaborn.

BI Tool Proficiency:
Power BI or Tableau knowledge can make you shine. Learn to:
• Build dashboards.
• Present actionable insights.
• Create visually engaging reports.

Excel for Everyday Analytics:


Excel remains essential, especially for:
• Pivot tables and lookups.
• Data wrangling and quick visualizations.
• Handling last-minute analysis with ease.

Statistics Fundamentals:
You don’t need to be a mathematician, but understanding:
• Mean, median, mode, and standard deviation.
• Hypothesis testing and confidence intervals.
• Basics of probability will give you an edge.

Key to Success: Practice these skills daily. Simulate business problems, analyze datasets, and
never stop learning.
Company Wise Interview Question:

Power BI:

1) What is a Factless Fact table?


2) Can you give an example of Slowly Changing Dimensions?
3) What is Append, and what is the necessary condition to append queries?
4) What are Relationship Modifiers in Power BI?
5) What is Incremental Refresh, and will you actually implement it in your model?
6) Can you tell any use case of using Bookmarks?
7) How can you differentiate between Related and Lookupvalue DAX functions?
8) What error do we get when we have many-to-many relationships between two tables?
9) What are the limitations of using Direct Query connection mode reports?
10) What is a Decomposition Tree visual, and how is it useful?
11) How can you change the order of the values displayed on the X-axis of a Column chart as per
the requirement?
12) Can we implement RLS (Row Level Security) in Power BI?
13) What is the difference between calculated columns and measures?
14) How do you handle performance optimization in Power BI?
15) What is a surrogate key, and why do we use it in Power BI?
16) How can you create a custom tooltip in Power BI?
17) How do you handle large datasets in Power BI?
18) Can you explain what a KPI visual is and how it is used?
19) What is the role of Hierarchies in Power BI?
20) How can you implement drillthrough in Power BI reports?

SQL:

1) Write a query to calculate the total revenue generated by each region.


2) Display the names of employees who have a salary above the average salary in their
department.
3) Identify the second highest salary in each department from the 'employees' table, which has
'emp_id', 'department_id', and 'salary' columns.
4) Write a SQL query to find employees who have not had any recent sales in the last 3 months.

Python:

1) Write a Python function to filter out customers who have made more than 5 purchases in the
last 6 months.
2) Create a program that reads a CSV file, extracts unique values in a column, and saves the
results in a new file.
3) Develop a Python script to visualize monthly sales trends for a dataset using Matplotlib or
Seaborn.

--------------------------------------------------------------------------------------------------------------------------------
SQL:

1) Calculate the average salary for each department from the table.
2) Write a SQL query to display the employee’s name along with their manager’s name using a self-
join on the ‘employees’ table, which contains ‘emp_id’, ‘name’, and ‘manager_id’ columns.
3) Find the most recent hire for each department (solved using LEAD/LAG functions).
4) Write a query to retrieve the nth highest salary from the Employees table, which has
‘EmployeeID’, ‘Name’, and ‘Salary’ columns.

Power BI:

1) What is meant by Filter context in DAX?


2) Explain the process of implementing Row-Level Security (RLS) in Power BI.
3) Describe the different types of filters available in Power BI.
4) What’s the difference between the ‘ALL’ and ‘ALLSELECTED’ functions in DAX?
5) How would you use DAX to calculate total sales for a specific product?

Python:

1) Create a dictionary, add elements, update a specific entry, and print the dictionary sorted by key
in alphabetical order.
2) Identify unique values from a list of numbers and print how many times each value occurs.
3) Find and print the duplicate values in a list of numbers, along with their frequency.

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) Calculate the cancellation rate for each room type over the last 6 months, considering only
bookings with a minimum stay of 2 nights.
2) Determine the average conversion rate (confirmed bookings vs. search events) for users grouped
by their country and device type.
3) Identify properties that have consistently underperformed compared to the average booking rate
of their region over the last 12 months.
4) Detect instances of demand surge where the number of bookings in an hour exceeds the hourly
average by more than 50%.
5) What challenges might arise when querying sharded databases, especially for calculating global
metrics like average booking rates?
6) Explain how you would handle booking timestamps originating from different time zones when
querying for global daily booking patterns.
7) how would you balance normalization for data integrity and denormalization for query
performance?
8) If two systems simultaneously update the same booking record, what mechanisms would you use
in SQL to prevent data conflicts and ensure consistency?
9) Explain the scenarios where window functions outperform traditional group-by clauses in SQL.

Guesstimate Questions:

1) Estimate the total number of hotel bookings made globally in a day on Booking.com
. Explain the factors and assumptions you would consider to arrive at your estimate.
2) How many unique users do you think search for flights on Booking.com
in a month? Provide a structured approach to your calculation.

Case Study Questions:

1) You notice a sudden drop in conversion rates (from search to booking) for hotels in a particular
city. How would you investigate the root cause and propose solutions?
2) Booking.com is launching a new feature that allows users to book multi-city trips. How would you
measure the success of this feature post-launch, and what metrics would you track to ensure its
adoption and profitability?

SQL:

1) Identify customers who have invested in at least two funds with opposite performance trends
(one increasing and the other decreasing) over the last 6 months.
2) Write a query to calculate the year-to-date portfolio returns for each client, ensuring that the
query can handle daily transactions across multiple funds.
3) Find the top 5 performing funds within each region based on their weighted average returns,
accounting for the size of investments in each fund.
4) Write a query to detect and list transactions that may indicate potential duplication (e.g.,
same client, fund, amount, and timestamp within 5 minutes).
5) Discuss the implementation and use of materialized views for financial dashboards. How
would you ensure they are updated efficiently in near real-time?
6) Explain the ACID properties of SQL transactions and their importance in a financial database
handling millions of trades daily.
7) Describe how you would design a sharding strategy for a massive transactional database
storing global trading data.
8) Explain the role of indexing in optimizing complex joins and aggregations for a dataset with
billions of financial records. Provide examples of when over-indexing might degrade
performance.

Python:

1) Write a Python function to find the second largest element in a list without using built-in
sorting functions.
2) Implement a program that takes a dictionary of funds (keys: fund names, values: returns) and
returns the name of the fund with the highest return.
3) Write a Python script to remove duplicates from a list of client IDs while maintaining the
original order.
4) Develop a function to merge two dictionaries containing client portfolios, summing the
values of common keys (assets).
5) Explain the differences between Python’s collections.defaultdict and a standard dictionary.
When would you use defaultdict in financial data processing?
6) Discuss how Python’s multiprocessing module can be used to improve the performance of a
script analyzing high-frequency trading data.
7) How would you use Python’s itertools module to generate all possible portfolio combinations
from a list of 10 assets? Discuss the computational challenges of such tasks.
8) Explain the use of Python’s decorators to log execution time and memory usage for functions
processing large-scale financial data.

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) You have a table Accounts with columns AccountID, CustomerID, Balance, and
LastTransactionDate. Write a query to identify accounts that have been inactive for more than 12
months.
2) Given a table Transactions with columns TransactionID, AccountID, Amount, and TransactionDate,
find the top 3 accounts with the highest total transaction volume for each month.
3) A table LoanApplications contains columns ApplicationID, CustomerID, LoanAmount,
ApprovalStatus, and ApplicationDate. Write a query to calculate the average loan amount for
approved applications submitted in the last six months.
4) Explain how a clustered index works and how it differs from a non-clustered index.
5) Describe a scenario where you would use a self-join in SQL and explain its benefits.

Python:

1) Write a Python script to process a JSON file containing customer data and convert it into a
structured DataFrame.
2) Develop a function to calculate the moving average of a given numerical column in a dataset.
3) Explain how you would use Python to validate and clean data received from an external source
(e.g., removing duplicates, handling null values, etc.).
4) Write a Python program to identify and report outliers in a numerical dataset using the
interquartile range (IQR) method.

-------------------------------------------------------------------------------------------------------------------------------------
SQL Questions:

1) Average Post Hiatus


Given a table of Facebook posts, for each user who posted at least twice in 2024, write a SQL query
to find the number of days between each user’s first post of the year and last post of the year in the
year 2024. Output the user and number of days between each user’s first and last post.

• posts table: user_id (integer), post_id (integer), post_date (timestamp), post_content (text)

2) Facebook Power Users


A Facebook power user is defined as someone who posts a ton and gets a lot of reactions on their
post. For the purpose of this question, consider a Facebook power user as someone who posts at
least twice a day and receives an average of 150 comments and/or reactions per post. Write a SQL
query to return the IDs of all Facebook power users, along with the number of posts, and the average
number of reactions per post.

• user_post table: user_id (integer), post_id (integer), post_date (timestamp)


• post_interactions table: post_id (integer), comments (integer), reactions (integer)

3) Active User Retention


Assume you’re given a table containing information on Facebook user actions. Write a SQL query to
obtain the number of monthly active users (MAUs) in July 2022, including the month in numerical
format “1, 2, 3”. Hint: An active user is defined as a user who has performed actions such as ‘sign-in’,
‘like’, or ‘comment’ in both the current month and the previous month.

• user_actions table: user_id (integer), event_id (integer), event_type (string), event_date (datetime)

4) Facebook Friend Recommendations


Facebook wants to recommend new friends to people who show interest in attending 2 or more of
the same private Facebook events. Write a SQL query to find pairs of friends to be recommended to
each other if they’re interested in attending 2 or more of the same private events.
Notes:
• A user interested in attending would have either ‘going’ or ‘maybe’ as their attendance status.
• Friend recommendations are unidirectional, meaning if user x and user y should be recommended
to each other, the result table should have both user x recommended to user y and user y
recommended to user x.
• The result should not contain duplicates (i.e., user y should not be recommended to user x
multiple times).

• friendship_status table: user_a_id (integer), user_b_id (integer), status (enum: ‘friends’,


‘not_friends’)
• event_rsvp table: user_id (integer), event_id (integer), event_type (enum: ‘public’, ‘private’),
attendance_status (enum: ‘going’, ‘maybe’, ‘not_going’), event_date (date)

5) Average Number of Shares per Post


As a data analyst at Facebook, you are asked to find the average number of shares per post for each
user.

• user_posts table: post_id (integer), user_id (integer), post_text (text), post_date (timestamp)
• post_shares table: share_id (integer), post_id (integer), share_date (timestamp)

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) Write a query to find the second-highest salary in a department. You might use ROW_NUMBER()
or DENSE_RANK() to achieve this.
2) Create a query to calculate the total number of transactions per user for each day. This typically
involves GROUP BY and COUNT() for aggregation.
3) Write a query to select projects with the highest budget-per-employee ratio from two related
tables (projects and employees). This tests your ability to handle complex joins and aggregations.

Power BI:

1) Explain the difference between Import and Direct Query modes. Which would you choose for
large datasets? (Direct Query enables real-time data but may be slower, whereas Import is faster but
static.)
2) What are slicers, and how do they differ from visual-level filters? Discuss their impact on data in a
Power BI dashboard.
3) How do you implement Row-Level Security (RLS) in Power BI? Explain how you would restrict data
access to specific users or groups.
4) What is a paginated report, and when would you use it? These are ideal for multi-page outputs
like invoices or billing statements.

Python:

1) Write a Python script to identify unique values in a list and count their occurrences. This tests your
understanding of sets and dictionaries.
2) How would you use pandas to merge two datasets and calculate total sales for products with valid
promotions? This involves merge(), groupby(), and basic data analysis functions.
3) Explain the differences between lists, tuples, sets, and dictionaries in Python, highlighting their use
cases in data manipulation and analysis.

--------------------------------------------------------------------------------------------------------------------------------------
SQL:

1) You are given a table EmployeeLogs with columns EmployeeID, LoginTime, LogoutTime, and Date.
Write a query to calculate the longest continuous working streak (consecutive days without missing a
login) for each employee.
2) A table SalesData contains columns TransactionID, Region, ProductID, SaleDate, and Revenue.
Write a query to identify the top 3 products with the highest cumulative revenue in each region over
the last fiscal year.
3) Explain the concept of window functions and how they differ from aggregate functions, with
examples of use cases.
4) Describe how ACID properties are maintained in modern relational databases and the challenges
associated with distributed systems.
5) Explain how different types of indexes (e.g., clustered, non-clustered) impact query performance
in large datasets.

Python:

1) Write a Python program to calculate the cosine similarity between two large datasets represented
as sparse vectors stored in a database.
2) You have a multi-gigabyte CSV file that doesn’t fit into memory. Write a Python script to calculate
the average value of a specific column without loading the entire file into memory.
3) Implement a Python class that dynamically creates a hierarchical data structure from a flat table
with ID, ParentID, and Value columns.

--------------------------------------------------------------------------------------------------------------------------------------

SQL Questions:

1) Apple has a trade-in program where their customers can return the old iPhone device to Apple
and Apple gives the customers the trade-in value (known as payout) of the device in cash.
Task: For each store, write a query of the total revenue from the trade-in. Order the result by
descending order.

Tables:

trade_in_transactions: transaction_id (integer), model_id (integer), store_id (integer),


transaction_date (date)
trade_in_payouts: model_id (integer), model_name (string), payout_amount (integer)
2) The Apple retention team needs your help to investigate buying patterns.
Task: Write a query to determine the percentage of buyers who bought AirPods directly after they
bought iPhones. Round your answer to a percentage (i.e., 20 for 20%, 50 for 50) with no decimals.

Clarifications:

The users were interested in buying iPhones and then AirPods, with no intermediate purchases in
between.
Users who buy iPhones and AirPods at the same time, with the iPhone logged first, can still be
counted.

Tables:
transactions: transaction_id (integer), customer_id (integer), product_name (string),
transaction_timestamp (datetime)

3) Given a reviews table containing data related to product reviews submitted by users over time:
Task: Write a SQL query to calculate the monthly average rating for each Apple product.

Tables:

reviews: review_id (integer), user_id (integer), product_id (integer), submit_date (datetime), stars
(integer)

4) The Apple Customer Retention Data Science team needs your help to investigate buying patterns
related to AirPods and iPhones.
Task: Write a SQL query to determine the percentage of buyers who bought AirPods directly after
they bought iPhones. Round your answer to a percentage (i.e., 20 for 20%, 50 for 50) with no
decimals.

Clarifications:

Buyers should purchase AirPods directly after iPhones, with no intermediate purchases.
Simultaneous purchases (with iPhone logged first) are counted.

Tables:

transactions: transaction_id (integer), customer_id (integer), product_name (string),


transaction_timestamp (datetime)

--------------------------------------------------------------------------------------------------------------------------------------
SQL:

1) Write an SQL query to extract the third transaction of every user, displaying user ID, spend, and
transaction date.
2) Calculate the average ratings for each driver across different cities using data from rides and
ratings tables.
3) Create an SQL query to identify customers registered with Gmail addresses from the 'users'
database.
4) What does database denormalization mean?
5) Analyze the click-through conversion rates using data from ad_clicks and cab_bookings tables.
6) Define a self-join and provide a scenario for its application.

Scenario-Based Question:

1) What is the probability that at least two of three recommended routes for drivers are the fastest,
given a 70% success rate?

Guesstimate Questions:

1) Estimate the number of Uber drivers in Delhi.


2) How many Uber cabs leave Bengaluru Airport in a day?

--------------------------------------------------------------------------------------------------------------------------------------

SQL Questions:

1) Most Popular Client ID:


Select the most popular client_id based on a count of the number of users who have at least 50% of
their events from the following list: ‘video call received’, ‘video call sent’, ‘voice call received’, ‘voice
call sent’.

2) Desktop-Only Users:
Write a query that returns the company (customer_id column) with the highest number of users that
use desktop only.

3) Bottom Companies by Mobile Usage:


Write a query that returns a list of the bottom 2 companies by mobile usage. Company is defined in
the customer_id column. Mobile usage is defined as the number of events registered on a client_id
== 'mobile'. Order the result by the number of events ascending. In the case where there are
multiple companies tied for the bottom ranks (rank 1 or 2), return all the companies. Output the
customer_id and number of events.

4) Exclusive Users per Client:


Write a query that returns a number of users who are exclusive to only one client. Output the
client_id and number of exclusive users.

5) Unique Users per Client per Month:


Write a query that returns the number of unique users per client per month.

6) Monthly User Share (New vs. Existing):


Calculate the share of new and existing users for each month in the table. Output the month, share
of new users, and share of `existing users as a ratio. New users are defined as users who started using
services in the current month (there is no usage history in previous months). Existing users are users
who used services in the current month but also used services in any previous month. Assume that
the dates are all from the year 2020.

Table Structure
• id: int64
• time_id: datetime64[ns]
• user_id: object
• customer_id: object
• client_id: object
• event_type: object
• event_id: int64

--------------------------------------------------------------------------------------------------------------------------------------

Power BI:

1) Explain the concept of context transition in DAX and provide an example.


2) How would you optimize a complex Power BI report for faster performance?
3) Describe the process of creating and using calculation groups in Power BI.
4) Explain how you would handle large datasets in Power BI without compromising performance.
5) What is a composite model in Power BI, and how can it be used effectively?
6) How does the USERELATIONSHIP function work, and when would you use it?
7) Describe how to use Power Query M language for advanced data transformations.
8) Explain the difference between CROSSFILTER and TREATAS in DAX.

SQL:

1) How would you optimize a slow-running query with multiple joins?


2) What is a recursive CTE, and can you provide an example of when to use it?
3) Explain the difference between clustered and non-clustered indexes and when to use each.
4) Write a query to find the second highest salary in each department.
5) How would you detect and resolve deadlocks in SQL?
6) Explain window functions and provide examples of ROW_NUMBER, RANK, and DENSE_RANK.
7) Describe the ACID properties in database transactions and their significance.
8) Write a query to calculate a running total with partitions based on specific conditions.

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() functions using
example.
Table: employee (EmpID, ManagerID, JoinDate, Dept, Salary)
2) Find the nth highest salary from the Employee table.
3) You have an employee table with employee ID and manager ID. Find all employees under a specific
manager, including their subordinates at any level.
4) Write a query to find the cumulative salary of employees department-wise, who have joined
company in last 30 days.
5) Find the top 2 customers with the highest order amount for each product category, handling ties
appropriately. Table: customer (CustomerID, ProductCategory, OrderAmount)

Power BI:

1) Explain step-by-step how will you create a sales dashboard from scratch.
2) Explain how you can optimize a slow Power BI report.
3) Explain Any 5 Chart Types and Their Uses in Representing Different Aspects of Data.

Behavioural:

1) Why you want to become a data analyst and why did you apply to this company?
2) Describe a time when you had to manage a difficult task with tight deadlines. How did you handle
it?

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) Identify the top 5 products that have shown the highest increase in weekly sales over the last
quarter.
2) Calculate the percentage of out-of-stock items for each store and identify the stores that exceed
20% out-of-stock items on average.
3) Find products that were consistently sold in every store across a region but saw no sales in at least
one store last month.
4) Design a query to track customer retention by calculating how many customers made purchases
every month for the last 6 months.
5) Explain how indexing works in SQL and how you would use it to optimize a query that involves
multiple joins on a large dataset of store transactions.
6) Discuss how you would manage and query a database containing billions of rows of sales data
across multiple time zones.
7) In the case of seasonal promotions, how would you design an SQL query to measure the
effectiveness of discounts on specific product categories?
8) Explain the difference between OLTP and OLAP databases, and provide examples of how Target
might use each for its operations.

Python:

1) Write a function to calculate the total sales of a product, given a list of daily sales data.
2) Develop a script that reads a dictionary containing product IDs and their stock counts, and flags
products that have stock below a certain threshold.
3) Write a Python function to generate a weekly sales report from a CSV file containing transaction
data (columns: date, product_id, sales_amount). The output should be the total sales for each
product for the week.
4) Explain how Python’s pandas library can be used to handle missing data in large sales datasets.
Provide an example of filling in missing values for daily sales based on the average sales of the
corresponding day of the week.
5) Write a function that takes two lists: one with store IDs and another with their corresponding
revenue. Return a dictionary where the keys are store IDs, and the values are the revenues.
6) What are the key differences between a list and a dictionary in Python? How would you use them
for different tasks in analyzing retail data?

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) A table EnergyConsumption contains columns DeviceID, Timestamp, and EnergyUsed. Write a


query to find the devices that exceeded their daily average energy usage by more than 20% in the
last month.
2) Given a table Machines with columns MachineID, StartTime, EndTime, and Output, write a query
to calculate the total operational time and average output for each machine over the last quarter.
3) You have a table Suppliers with columns SupplierID, Region, and Rating. Write a query to rank
suppliers by their rating within each region.
4) Explain the differences between OLAP and OLTP databases with examples of use cases for each.
5) How would you approach optimizing a SQL query that involves multiple joins and subqueries?

Python:

1) Write a Python script to simulate and visualize the operational efficiency of a machine based on
given performance metrics over time.
2) How would you use Python to connect to an SQL database, query data, and save the results as a
CSV file?
3) Write a Python function to calculate the correlation between two columns in a dataset.
4) Explain how you would use Python libraries such as Pandas and Matplotlib to identify and
visualize trends in manufacturing data.

Power BI:

1) How would you design a Power BI dashboard to monitor the performance of production lines
across multiple factories?
2) What steps would you take to integrate data from multiple sources (e.g., SQL Server, Excel) into a
single Power BI model?
3) Explain the concept of Direct Query in Power BI and its advantages and limitations.
4) How would you use What-If parameters in Power BI to simulate different scenarios for resource
planning?
5) Write a DAX measure to calculate the cumulative production output for a factory over a specific
period.

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) You have a table Transactions with columns TransactionID, CustomerID, Date, and Amount. Write
a query to calculate the cumulative revenue per customer for each month in the last year.
2) A table Production contains columns PlantID, Date, and Output. Write a query to identify the
plants that consistently exceeded their daily average output for at least 20 days in a given month.
3) In a table EmployeeAttendance with columns EmployeeID, Date, and Status (values: ‘Present’,
‘Absent’), write a query to find employees with the highest consecutive absences in the last quarter.
4) What are the pros and cons of using indexes in SQL, and when would you avoid using them?
5) Explain the differences between window functions and aggregate functions with examples.

Python:

1) Write a Python script to merge multiple CSV files from a directory into a single file and perform
basic data cleaning.
2) Given a list of dictionaries, write a Python program to group the data by a specific key and
calculate summary statistics for the grouped data.
3) Explain the difference between a list, a tuple, and a dictionary in Python, and provide examples of
their usage.
4) Write a Python function to automate the generation of monthly reports from a dataset stored in
an Excel file.
Power BI:

1) How would you create a dashboard in Power BI to track the operational efficiency of production
plants?
2) Explain how you would handle a situation where the data source refresh in Power BI is causing
delays.
3) What is the difference between row-level security and role-level security in Power BI?
4) How would you use Power BI to visualize trends and outliers in daily sales data?
5) Discuss how you would create a calculated measure to show YoY (Year-over-Year) growth in Power
BI.

General Questions:

1) Share an example where your data-driven insights helped solve a business problem or improve a
process.
2) How do you prioritize tasks and manage deadlines in a high-pressure environment?
3) Why do you want to join Shell, and how do you see your expertise aligning with their vision?

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) Retrieve Housing Data from Specific Cities

You’re a data analyst at Airbnb and you’ve been tasked with retrieving housing data from specific
cities. You want to find all Airbnb listings in San Francisco and New York that have at least 10 reviews
and an average rating equal to or above 4.5.

Tables:
• listings: listing_id (integer), name (string), city (string), reviews_count (integer)
• reviews: listing_id (integer), review_id (integer), stars (integer), submit_date (date)

2) Find the Average Number of Guests per Booking in Each City for Airbnb

As an analyst at Airbnb, one of the most useful insights you could provide would be to understand
the average number of guests per booking across locations. For this question, we would like you to
write a SQL query that will find the average number of guests per booking in each city.

Tables:
• bookings: booking_id (integer), property_id (integer), guests (integer), booking_date (date)
• properties: property_id (integer), city (string)

3) Analyzing Click-Through Rates for Airbnb Listing Views and Bookings

The scenario is that Airbnb wants to analyze the click-through conversion rates (CTRs) of their
listings. The CTR is calculated by dividing the number of bookings by the number of listing views,
giving a proportion of views that resulted in a booking.

Tables:
• listing_views: view_id (integer), user_id (integer), visit_date (date), listing_id (integer)
• bookings: booking_id (integer), user_id (integer), booking_date (date), listing_id (integer)

4) Average Vacant Days in 2021 for Active Listings

The strategy team in Airbnb is trying to analyze the impact of Covid-19 during 2021. To do so, they
need you to write a query that outputs the average vacant days across the AirBnbs in 2021. Some
properties have gone out of business, so you should only analyze rentals that are currently active.
Round the results to a whole number.

Assumptions:
• is_active field equals 1 when the property is active, and 0 otherwise.
• In cases where the check-in or check-out date is in another year other than 2021, limit the
calculation to the beginning or end of the year 2021 respectively.
• A listing can be active even if there are no bookings throughout the year.

Tables:
• bookings: listing_id (integer), checkin_date (date), checkout_date (date)
• listings: listing_id (integer), is_active (integer)

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) Identify VIP Users for Netflix

Question: To better cater to its most dedicated users, Netflix would like to identify its “VIP users” -
those who are most active in terms of the number of hours of content they watch. Write a SQL query
that will retrieve the top 10 users with the most watched hours in the last month.

Tables:
• users table: user_id (integer), sign_up_date (date), subscription_type (text)
• watching_activity table: activity_id (integer), user_id (integer), date_time (timestamp), show_id
(integer), hours_watched (float)

2) Analyzing Ratings For Netflix Shows

Question: Given a table of user ratings for Netflix shows, calculate the average rating for each show
within a given month. Assume that there is a column for user_id, show_id, rating (out of 5 stars), and
date of review. Order the results by month and then by average rating (descending order).
Tables:
• show_reviews table: review_id (integer), user_id (integer), review_date (timestamp), show_id
(integer), stars (integer)

3) What does EXCEPT / MINUS SQL commands do?

Question: Explain the purpose and usage of the EXCEPT (or MINUS in some SQL dialects) SQL
commands.

4) Filter Netflix Users Based on Viewing History and Subscription Status

Question: You are given a database of Netflix’s user viewing history and their current subscription
status. Write a SQL query to find all active customers who watched more than 10 episodes of a show
called “Stranger Things” in the last 30 days.

Tables:
• users table: user_id (integer), active (boolean)
• viewing_history table: user_id (integer), show_id (integer), episode_id (integer), watch_date (date)
• shows table: show_id (integer), show_name (text)

5) What does it mean to denormalize a database?

Question: Explain the concept and implications of denormalizing a database.

6) Filter and Match Customer’s Viewing Records

Question: As a data analyst at Netflix, you are asked to analyze the customer’s viewing records. You
confirmed that Netflix is especially interested in customers who have been continuously watching a
particular genre - ‘Documentary’ over the last month. The task is to find the name and email of those
customers who have viewed more than five ‘Documentary’ movies within the last month.
‘Documentary’ could be a part of a broader genre category in the genre field (for example,
‘Documentary, History’). Therefore, the matching pattern could occur anywhere within the string.

Tables:
• movies table: movie_id (integer), title (text), genre (text), release_year (integer)
• customer table: user_id (integer), name (text), email (text), last_movie_watched (integer),
date_watched (date)

--------------------------------------------------------------------------------------------------------------------------------------
SQL:

1) Write a query to find the cumulative revenue by month for each product category in a sales table.
2) How would you retrieve the top 5 products by sales volume, excluding any products that had zero
sales in the past 3 months?
3) Given a table of customer transactions, identify all customers who made purchases in two or
more consecutive months.
4) Write a query to calculate the retention rate of users on a monthly basis.
5) Find the nth highest salary from an employee table, where n is a parameter passed dynamically to
the query.
6) Explain how indexing works in SQL and how to decide which columns should be indexed for
optimal performance.
7) Describe the differences between LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN and when to use
each one in a complex query.
8) What is the difference between HAVING and WHERE clauses in SQL, and when would you use
each?

Python:

1) Write a Python function to find all unique pairs of integers in a list that sum up to a given target
value.
2) Given a string, write a function to check if it’s a palindrome, ignoring spaces, punctuation, and
case sensitivity.
3) Explain the difference between deep copy and shallow copy in Python. When would you use
each?
4) What are decorators in Python, and how do they work? Provide an example of a scenario where a
decorator would be useful.

Guesstimate Questions:

1) Estimate the number of smartphones sold in India annually.


2) How would you estimate the daily revenue generated by roadside tea stalls across India?

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) What are window functions, and how do they differ from aggregate functions? Can you give a use
case?
2) Explain indexing. When would an index potentially reduce performance, and how would you
approach indexing strategy for a large dataset?
3) Write a query to retrieve customers who have made purchases in the last 30 days but did not
purchase anything in the previous 30 days.
4) Given a table of transactions, find the top 3 most purchased products for each category.
5) How would you identify duplicate records in a large dataset, and how would you remove only the
duplicates, retaining the first occurrence?

Guesstimates:

1) Estimate the number of online food delivery orders in a large metropolitan city over a month.
2) How many customer service calls would a telecom company receive daily for a customer base of 1
million?

Case Studies:

1) A sudden decrease in conversion rate is observed in a popular product category. How would you
investigate the cause and propose solutions?
2) Imagine the company is considering adding a new subscription model. How would you evaluate
its potential impact on customer lifetime value and revenue?

Managerial Questions:

1) Describe a time when you faced conflicting priorities on a project. How did you manage your
workload to meet deadlines?
2) How would you handle a disagreement within the team on an analytical approach?

Python:

1) Write a Python function to find the longest consecutive sequence of unique numbers in a list.
2) If you’re working with a large dataset with missing values, what Python libraries would you use to
handle missing data, and why?

--------------------------------------------------------------------------------------------------------------------------------------

Round 1: Elimination Resume Grilling + SQL

1) 2-3 SQL Question, around self join , usage of Ranking functions and one case specific question
around Lead/lag window function
2) Myntra wants to improve its data quality by identifying duplicate or similar product listings in their
inventory. Due to slight variations in product titles (such as spelling differences, abbreviations, or
extra spaces), some products might appear more than once in the inventory. You are tasked with
identifying pairs of products that are likely duplicates based on their titles, using fuzzy matching.

Round 2: Case Study


1) A basic guess estimate question, Estimate the number of Uber Cars are there in Bengaluru?
2) Myntra, wants to solve the issue pertaining to fraudulent customer behavior, list down all the
metrics and provide some actionable on solving this issue

This round also had resume grilling

Round 3: (Hiring Manager):

1) Resume Grilling around the projects being mentioned on resume.


2) Behavioral questions and whether you would be the right fit at Myntra

Round 4: HR Discussion

1) Discussion around salary, joining date and what culture to expect at Myntra.

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) Write a query to find the top 3 insurance products with the highest claims in the last year.
2) Given a table of customer policies and claim dates, find customers with multiple claims within the
same month.
3) How would you retrieve policyholders whose premium increased more than 20% in the past year?
4) Provided two table write down the output for left, right, inner and cross join.
5) Describe the purpose of window functions and how they might be used to analyze customer
behaviour trends over time.
6) What is indexing? Explain its use cases?

Case Study:

The interviewer presented a case where we had to analyze the cancellation trends of health
insurance policies within a year. They asked to suggest key metrics to track, the approach to
identifying top reasons for cancellations, and strategies to improve customer retention based on
findings.

Guesstimate:

Estimate the total number of people in India who might be willing to purchase health insurance in
next one year.

------------------------------------------------------------------------------------------------------------------------------------
Power BI :

1) Create a DAX measure to identify the top 5 products by color using data from ‘Product’ (columns:
‘product_id’, ‘product_color’) and ‘Sales’ (columns: ‘product_id’, ‘profit’).
2) Discuss how to set up Row-Level Security (RLS) in Power BI.
3) Outline the various types of filters available in Power BI.
4) Clarify the distinction between the ‘ALL’ and ‘ALLSELECTED’ functions in DAX.
5) Write a DAX formula for calculating the 12-month rolling average of sales with a table ‘sales’ that
includes a ‘DATE’ column, ‘product_id’, and ‘profit’.

Python:

1) Build a dictionary, add items to it, modify an entry, and display it sorted by key.
2) Identify unique values in a list of mixed numbers and show how often each value appears.
3) Detect duplicate values in a list of mixed numbers and count their occurrences.
4) Develop a function to add two numbers, take user input, and handle errors such as non-numeric
or empty input.

SQL:

1) Construct a SQL query to display employee and manager names using a self-join on an
‘employees’ table with ‘emp_id’, ‘name’, and ‘manager_id’ columns.
2) Describe the order of operations in a SQL query.

--------------------------------------------------------------------------------------------------------------------------------------

Power BI:

1) How do you ensure data integrity when combining data from multiple sources in Power BI?
2) What is the difference between a slicer and a filter in Power BI? When would you use each?
3) Describe a time when you optimized a Power BI report for performance. What steps did you take?
4) How do you handle missing values in your Power BI datasets?
5) What techniques can you use to visualize KPI trends effectively in Power BI?
6) How would you create a measure to calculate year-over-year growth in Power BI?
7) Explain how you would implement row-level security in a Power BI report.
8) How do you ensure your Power BI reports are user-friendly for non-technical stakeholders?

SQL:

1) Explain the outputs for all the joins.


2) How would you write a query to find the top five products by sales?
3) Explain the purpose of indexes in SQL. How do they improve query performance?
4) Can you describe a situation where you had to optimize a slow-running SQL query? What
approach did you take?

-------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) Describe a scenario where you used SQL to analyze customer data. What insights did you
uncover?
2) Rate your SQL skills on a scale of 1-10 and provide examples of advanced queries you’ve written.
3) Write a query to identify the second-highest salary in each department.
4) Explain the concept of JOINs and provide a query that joins three tables (Orders, Customers,
Products) to find the top 5 customers by revenue.
5) What is a Common Table Expression (CTE) in SQL, and when would you use it? Write a CTE query
to calculate cumulative monthly sales.
6) Write an SQL query to find all employees whose salaries are above the department average.
7) Describe your approach to optimizing SQL queries. Can you share an example where optimization
made a noticeable difference?

Python:

1) What Python libraries do you frequently use for data manipulation, and why?
2) How would you write a Python function to calculate moving averages for sales data?
3) Write a Pandas code snippet to remove outliers from a dataset based on the IQR method.
4) Describe a project where you used Matplotlib or Seaborn for data visualization. What insights did
your visualizations reveal?
5) How would you merge three DataFrames (Sales, Customers, Regions) and compute the average
sales per region?
6) Write a Python code snippet to group data by product category and calculate total revenue for
each category.
7) How do you handle missing data in Python? Share a few imputation techniques you use.
8) Explain how you would use time series analysis in Python for forecasting monthly sales.

--------------------------------------------------------------------------------------------------------------------------------------
SQL:

1) How would you optimize a slow-running query with multiple joins?


2) What is a recursive CTE, and can you provide an example of when to use it?
3) Explain the difference between clustered and non-clustered indexes and when to use each.
4) Write a query to find the second highest salary in each department.
5) How would you detect and resolve deadlocks in SQL?
6) Explain window functions and provide examples of ROW_NUMBER, RANK, and DENSE_RANK.
7) Describe the ACID properties in database transactions and their significance.
8) Write a query to calculate a running total with partitions based on specific conditions.

Power BI:

1) Explain the concept of context transition in DAX and provide an example.


2) How would you optimize a complex Power BI report for faster performance?
3) Describe the process of creating and using calculation groups in Power BI.
4) How would you handle large datasets in Power BI without compromising performance?
5) What is a composite model in Power BI, and how can it be used effectively?
6) How does the USERELATIONSHIP function work, and when would you use it?
7) Explain how to use Power Query M language for advanced data transformations.
8) Differentiate between CROSSFILTER and TREATAS in DAX.

--------------------------------------------------------------------------------------------------------------------------------------

Round 1: Technical (Excel and SQL)

Excel-related Questions:
-Perform aggregation functions on a sample dataset to find unique records.
-Use VLOOKUP by creating a key from two columns for advanced lookups.
-Conditional formatting and other related Excel tasks.

SQL-related Questions:
-Could you explain and apply window functions like ROW_NUMBER, RANK, and DENSE_RANK and
discuss their differences?
-Write SQL queries to delete duplicates from a column while keeping the first occurrence.
-Solve other basic SQL queries, focusing on efficiency and clarity.

Round 2: Business Round


Introduction and Experience:

-Discussed my past work experience and explained my projects in detail.

Guesstimate Question:
-How many Santro cars are there in Shimla? (or you can replace it with your car-city preference).

Business Knowledge Questions:


-What is the impact of AI and ML on the current Indian market?
-Discussed the role of emerging technologies in shaping business strategies.

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) Find the second-highest salary in a department


• Efficiently solve using ROW_NUMBER() or DENSE_RANK().

2) Calculate daily transactions per user


• Use GROUP BY and COUNT() for aggregation.

3) Projects with the highest budget-per-employee ratio


• Showcase your expertise in JOINs and advanced aggregations!

Power BI:

1) Import vs. Direct Query


• Trade-offs between speed (Import) and real-time data (Direct Query).

2) Slicers vs. Filters


• Understand their impact on dashboards and user experience.

3) Row-Level Security (RLS)


• Restrict data access for specific users or groups in Power BI.

4) Paginated Reports
• Great for multi-page outputs like invoices and statements.

Python:

1) Identify unique values in a list and count occurrences


• Leverage sets and dictionaries for clean, efficient scripts.

2) Merge datasets and calculate sales using pandas


• Master merge(), groupby(), and other pandas functions.
3) Lists, Tuples, Sets, Dictionaries
• Be ready to explain differences and use cases in data analysis.

--------------------------------------------------------------------------------------------------------------------------------------

1) Zomato encountered an issue with their delivery system. Due to an error in the delivery driver
instructions, each item's order was swapped with the item in the subsequent row. As a data analyst,
you're asked to correct this swapping error and return the proper pairing of order ID and item.

If the last item has an odd order ID, it should remain as the last item in the corrected data. For
example, if the last item is Order ID 7 Burger, then it should remain as Order ID 7 in the corrected
data.
Write an SQL query to return the correct pairs of order IDs and items.
orders schema: order_id, item

2) You have been tasked with tracking the average ratings for each restaurant over time. The data
you have been provided with includes individual reviews, with the reviewer's ID, the date of the
review, the restaurant ID, and the rating.
Write a SQL query to find the average rating for each restaurant for each month. Only include
restaurants with at least 2 reviews in a given month.
reviews Schema: review_id, user_id, submit_date, restaurant_id, rating

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) How would you optimize a slow-running query with multiple joins?


2) What is a recursive CTE, and can you provide an example of when to use it?
3) Explain the difference between clustered and non-clustered indexes and when to use each.
4) Write a query to find the second highest salary in each department.
5) How would you detect and resolve deadlocks in SQL?
6) Explain window functions and provide examples of ROW_NUMBER, RANK, and DENSE_RANK.
7) Describe the ACID properties in database transactions and their significance.
8) Write a query to calculate a running total with partitions based on specific conditions.

Power BI:

1) Explain the concept of context transition in DAX and provide an example.


2) How would you optimize a complex Power BI report for faster performance?
3) Describe the process of creating and using calculation groups in Power BI.
4) How would you handle large datasets in Power BI without compromising performance?
5) What is a composite model in Power BI, and how can it be used effectively?
6) How does the USERELATIONSHIP function work, and when would you use it?
7) Explain how to use Power Query M language for advanced data transformations.
8) Differentiate between CROSSFILTER and TREATAS in DAX.

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) You have a table Transactions with columns TransactionID, CustomerID, Date, and Amount. Write
a query to calculate the cumulative revenue per customer for each month in the last year.
2) A table Production contains columns PlantID, Date, and Output. Write a query to identify the
plants that consistently exceeded their daily average output for at least 20 days in a given month.
3) In a table EmployeeAttendance with columns EmployeeID, Date, and Status (values: ‘Present’,
‘Absent’), write a query to find employees with the highest consecutive absences in the last quarter.
4) What are the pros and cons of using indexes in SQL, and when would you avoid using them?
5) Explain the differences between window functions and aggregate functions with examples.

Python:

1) Write a Python script to merge multiple CSV files from a directory into a single file and perform
basic data cleaning.
2) Given a list of dictionaries, write a Python program to group the data by a specific key and
calculate summary statistics for the grouped data.
3) Explain the difference between a list, a tuple, and a dictionary in Python, and provide examples of
their usage.
4) Write a Python function to automate the generation of monthly reports from a dataset stored in
an Excel file.

Power BI:

1) How would you create a dashboard in Power BI to track the operational efficiency of production
plants?
2) Explain how you would handle a situation where the data source refresh in Power BI is causing
delays.
3) What is the difference between row-level security and role-level security in Power BI?
4) How would you use Power BI to visualize trends and outliers in daily sales data?
5) Discuss how you would create a calculated measure to show YoY (Year-over-Year) growth in Power
BI.
General Questions:

1) Share an example where your data-driven insights helped solve a business problem or improve a
process.
2) How do you prioritize tasks and manage deadlines in a high-pressure environment?
3) Why do you want to join EXL, and how do you see your expertise aligning with their vision?

--------------------------------------------------------------------------------------------------------------------------------------

1) Write a Python function to analyze daily order data and calculate a 7-day moving average for the
number of orders.
• Input: A list of daily orders: [1200, 1500, 1800, 2000, 2100, 2500, 2700, 3000].
• Output: A list of moving averages.
• Constraint: Optimize for large datasets.

2) Identify customers who have not placed any orders in the last 30 days but had placed more than
10 orders in the previous quarter. Use an orders table with columns: customer_id, order_id, and
order_date.

3) You are tasked with designing a Power BI dashboard to track the performance of Swiggy’s loyalty
program. Include metrics like total members, redemption rate, and ROI of rewards. Explain your
approach to structuring and visualizing the data for actionable insights.

4) Guesstimate:
Estimate the number of food orders Swiggy processes daily in a Tier-1 city like Bangalore.

--------------------------------------------------------------------------------------------------------------------------------------

SQL:

1) Write a query to calculate the total revenue generated by each region.


2) Display the names of employees who have a salary above the average salary in their department.
3) Identify the second-highest salary in each department from the employees table.
(Columns: emp_id, department_id, salary)
4) Write a SQL query to find employees who have not had any sales in the last 3 months.
Power BI:

1) Explain how you would create a dynamic date filter in Power BI for last month’s data.
2) Describe the steps for setting up role-based access in Power BI using Row-Level Security (RLS).
3) What is the difference between a calculated column and a measure in Power BI?
4) How would you approach building a KPI dashboard to track multiple metrics over time?

Python:

1) Write a Python function to filter out customers who made more than 5 purchases in the last 6
months.
2) Create a program to read a CSV file, extract unique values in a column, and save the results to a
new file.
3) Develop a Python script to visualize monthly sales trends using Matplotlib or Seaborn.

Case Study Round (Guesstimate):


1) Estimate the number of coffee cups sold in Delhi daily.
This type of question tests your analytical thinking and problem-solving approach.

You might also like