Operational Analytics and Investigating Metric Spike
Operational Analytics and Investigating Metric Spike
Operational Analytics and Investigating Metric Spike
INVESTIGATION METRICS
Advanced SQL
─
-ABRAR AHMED
1
PROJECT DESCRIPTION
This operational analytics and investigation metrics project involves analyzing various
case studies and answering different questions raised in a business using SQL , which
will help that business to take specific actions to improve the customer satisfaction.
The project aims to gain insights into business operations and identify areas for
improvement through the analysis of data. It helps the business to track the user
engagement with their products and even keep track of the user retention over a period
of time.
PROJECT APPROACH
For the execution of the project , I have used Google’s BigQuery . Google BigQuery is
a cloud-based data warehouse platform that enables users to store, process, and
analyze large datasets using SQL-like syntax. It is part of the Google Cloud Platform
(GCP) and offers features such as real-time analytics, automatic scaling, and serverless
computing.
First step was to create a new project inside which I created a new dataset named
OperationalAnalytics and beneath it I inserted the tables provided along with this
project. Secondly, I observed all the dataset carefully alongside reading the questions
which are to be answered by performing analysis. Later, by spending some time on
querying I analyzed the datasets.
TECH-STACK USED
MySQL version 8.0 is used for the purpose of querying . While the platform I used is
Google’s BigQuery. I have also used an online database named ‘db-fiddle.com’ for
some of the executions.
2
INSIGHTS
Case study I (job data)
The following queries are performed based on case study I:
A. Number of jobs reviewed: Amount of jobs reviewed over time.
Query:
/*Calculate the number of jobs reviewed per hour per day for November 2020? */
select cast(ds as date) as Dates, round((count(job_id)/sum(time_spent))*3600)
as 'jobs reviewed per day'
from job_data
group by ds;
Results:
In the above table, we can see that on 28th of November, the maximum number
of jobs are reviewed, which is 218.
3
Results:
In the above queried tables, the whole week throughput is 0.03, while the daily
throughput varies from 0.01 to 0.06.
For throughput, do you prefer daily metric or 7-day rolling and why?
In such cases, I will choose the daily throughput as it will help the company to improve
the throughput if they are updated daily .
4
Results:
From the above queried table it can be concluded that, Arabic, English, French,
Hindi, Italian share the same percentage of shares which is 12.5 % . While
Persian alone has 37.5 % of language shares.
5
D. Duplicate rows: Rows that have the same value present in them.
Query:
/*Let’s say you see some duplicate rows in the data. How will you display duplicates
from the table?*/
select job_id , count(*) as 'number of duplicates'
from job_data
group by job_id
having count(*) >1;
Results:
Similarly, by querying the column we want, we can display the duplicate row from
that particular column. In this case , I have taken two column which are ‘job_id’
and ‘actor_id’.
6
Results:
The above table showcases the number of active users in different weeks.
7
Results:
In the above queried table, the user growth rate is calculated in percentage along
with the respective number of users.
8
C. Weekly Retention: Users getting retained weekly after signing-up for a product.
Query:
/*Calculate the weekly retention of users-sign up cohort?*/
select first_week as week_number,
sum(case when week_diff = 0 then 1 else 0 end) as week_0,
sum(case when week_diff = 1 then 1 else 0 end) as week_1,
sum(case when week_diff = 2 then 1 else 0 end) as week_2,
sum(case when week_diff = 3 then 1 else 0 end) as week_3,
sum(case when week_diff = 4 then 1 else 0 end) as week_4,
sum(case when week_diff = 5 then 1 else 0 end) as week_5,
sum(case when week_diff = 6 then 1 else 0 end) as week_6,
sum(case when week_diff = 7 then 1 else 0 end) as week_7,
sum(case when week_diff = 8 then 1 else 0 end) as week_8,
sum(case when week_diff = 9 then 1 else 0 end) as week_9,
sum(case when week_diff = 10 then 1 else 0 end) as week_10,
sum(case when week_diff = 11 then 1 else 0 end) as week_11,
sum(case when week_diff = 12 then 1 else 0 end) as week_12,
sum(case when week_diff = 13 then 1 else 0 end) as week_13,
sum(case when week_diff = 14 then 1 else 0 end) as week_14,
sum(case when week_diff = 15 then 1 else 0 end) as week_15,
sum(case when week_diff = 16 then 1 else 0 end) as week_16,
sum(case when week_diff = 17 then 1 else 0 end) as week_17,
sum(case when week_diff = 18 then 1 else 0 end) as week_18
from
(select m.user_id,m.login_week,n.first_week,m.login_week - n.first_week as week_diff from
(select user_id,extract(week from occurred_at) as login_week from `OperationalAnalytics.events` group
by 1,2) m,
(select user_id ,min(extract(week from occurred_at)) as first_week from `OperationalAnalytics.events`
group by 1 ) n
where m.user_id = n.user_id)
group by first_week
order by first_week asc;