Operational Analytics and Investigating Metric Spike

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

OPERATIONAL ANALYTICS AND

INVESTIGATION METRICS
Advanced SQL

-ABRAR AHMED
1

PROJECT DESCRIPTION
This operational analytics and investigation metrics project involves analyzing various
case studies and answering different questions raised in a business using SQL , which
will help that business to take specific actions to improve the customer satisfaction.
The project aims to gain insights into business operations and identify areas for
improvement through the analysis of data. It helps the business to track the user
engagement with their products and even keep track of the user retention over a period
of time.

PROJECT APPROACH
For the execution of the project , I have used Google’s BigQuery . Google BigQuery is
a cloud-based data warehouse platform that enables users to store, process, and
analyze large datasets using SQL-like syntax. It is part of the Google Cloud Platform
(GCP) and offers features such as real-time analytics, automatic scaling, and serverless
computing.
First step was to create a new project inside which I created a new dataset named
OperationalAnalytics and beneath it I inserted the tables provided along with this
project. Secondly, I observed all the dataset carefully alongside reading the questions
which are to be answered by performing analysis. Later, by spending some time on
querying I analyzed the datasets.

TECH-STACK USED
MySQL version 8.0 is used for the purpose of querying . While the platform I used is
Google’s BigQuery. I have also used an online database named ‘db-fiddle.com’ for
some of the executions.
2

INSIGHTS
Case study I (job data)
The following queries are performed based on case study I:
A. Number of jobs reviewed: Amount of jobs reviewed over time.
Query:
/*Calculate the number of jobs reviewed per hour per day for November 2020? */
select cast(ds as date) as Dates, round((count(job_id)/sum(time_spent))*3600)
as 'jobs reviewed per day'
from job_data
group by ds;

Results:

In the above table, we can see that on 28th of November, the maximum number
of jobs are reviewed, which is 218.
3

B. Throughput: It is the no. of events happening per second.


Query:
/*Calculate 7 day rolling average of throughput*/
select round(count(event)/sum(time_spent),2) as 'weekly throughput'
from job_data;
select cast(ds as date) as dates , round(count(event)/sum(time_spent),2) as
'daily throughput'
from job_data
group by ds;

Results:

In the above queried tables, the whole week throughput is 0.03, while the daily
throughput varies from 0.01 to 0.06.
For throughput, do you prefer daily metric or 7-day rolling and why?
In such cases, I will choose the daily throughput as it will help the company to improve
the throughput if they are updated daily .
4

C. Percentage share of each language: Share of each language for different


contents.
Query:
/*Calculate the percentage share of each language in the last 30 days*/
select language, round((count(language)/(select count(*) from job_data))*100,2)
as 'percent share of each language'
from job_data
group by language;

Results:

From the above queried table it can be concluded that, Arabic, English, French,
Hindi, Italian share the same percentage of shares which is 12.5 % . While
Persian alone has 37.5 % of language shares.
5

D. Duplicate rows: Rows that have the same value present in them.
Query:
/*Let’s say you see some duplicate rows in the data. How will you display duplicates
from the table?*/
select job_id , count(*) as 'number of duplicates'
from job_data
group by job_id
having count(*) >1;

select actor_id , count(*) as 'number of duplicates'


from job_data
group by actor_id
having count(*) >1;

Results:

Similarly, by querying the column we want, we can display the duplicate row from
that particular column. In this case , I have taken two column which are ‘job_id’
and ‘actor_id’.
6

Case study II (Investigate metric spike)


The following queries are performed based on case study II:
A. User Engagement: To measure the activeness of a user. Measuring if the user
finds quality in a product/service.
Query:
/*Calculate the weekly user engagement?*/
select extract (week from occurred_at) as week_numbers ,count(distinct user_id)
as active_users
from OperationalAnalytics.events
where event_type = 'engagement'
group by week_numbers
order by week_numbers;

Results:

The above table showcases the number of active users in different weeks.
7

B. User Growth: Amount of users growing over time for a product.


Query:
/*Calculate the user growth for the product?*/
select months, number_of_users,
round((number_of_users/lag(number_of_users) over (order by months)-1)*100,2)
as percent_user_growth
from (
select extract(month from created_at) as months,count(activated_at) as
number_of_users
from OperationalAnalytics.users
where activated_at is not null
group by months )
order by months asc;

Results:

In the above queried table, the user growth rate is calculated in percentage along
with the respective number of users.
8

C. Weekly Retention: Users getting retained weekly after signing-up for a product.
Query:
/*Calculate the weekly retention of users-sign up cohort?*/
select first_week as week_number,
sum(case when week_diff = 0 then 1 else 0 end) as week_0,
sum(case when week_diff = 1 then 1 else 0 end) as week_1,
sum(case when week_diff = 2 then 1 else 0 end) as week_2,
sum(case when week_diff = 3 then 1 else 0 end) as week_3,
sum(case when week_diff = 4 then 1 else 0 end) as week_4,
sum(case when week_diff = 5 then 1 else 0 end) as week_5,
sum(case when week_diff = 6 then 1 else 0 end) as week_6,
sum(case when week_diff = 7 then 1 else 0 end) as week_7,
sum(case when week_diff = 8 then 1 else 0 end) as week_8,
sum(case when week_diff = 9 then 1 else 0 end) as week_9,
sum(case when week_diff = 10 then 1 else 0 end) as week_10,
sum(case when week_diff = 11 then 1 else 0 end) as week_11,
sum(case when week_diff = 12 then 1 else 0 end) as week_12,
sum(case when week_diff = 13 then 1 else 0 end) as week_13,
sum(case when week_diff = 14 then 1 else 0 end) as week_14,
sum(case when week_diff = 15 then 1 else 0 end) as week_15,
sum(case when week_diff = 16 then 1 else 0 end) as week_16,
sum(case when week_diff = 17 then 1 else 0 end) as week_17,
sum(case when week_diff = 18 then 1 else 0 end) as week_18
from
(select m.user_id,m.login_week,n.first_week,m.login_week - n.first_week as week_diff from
(select user_id,extract(week from occurred_at) as login_week from `OperationalAnalytics.events` group
by 1,2) m,
(select user_id ,min(extract(week from occurred_at)) as first_week from `OperationalAnalytics.events`
group by 1 ) n
where m.user_id = n.user_id)
group by first_week
order by first_week asc;

You might also like