0% found this document useful (0 votes)
91 views

Excel / SQL Question

The document provides sample data from an Excel dataset and SQL database with user information. It asks questions about calculating totals from the data, such as the total number of items purchased by female users and total revenue for female users. It also provides the answers to the questions using Excel formulas and SQL queries. Finally, it presents a case study question about writing blog posts with insights derived from WhatsApp user data and provides example insights about user demographics, usage patterns, user identities, assessing exposure to campaigns, hotspot analysis, and fraud analytics.

Uploaded by

Arul Alexander
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

Excel / SQL Question

The document provides sample data from an Excel dataset and SQL database with user information. It asks questions about calculating totals from the data, such as the total number of items purchased by female users and total revenue for female users. It also provides the answers to the questions using Excel formulas and SQL queries. Finally, it presents a case study question about writing blog posts with insights derived from WhatsApp user data and provides example insights about user demographics, usage patterns, user identities, assessing exposure to campaigns, hotspot analysis, and fraud analytics.

Uploaded by

Arul Alexander
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Excel / SQL question:

Here are the first 5 rows of a 10,000 row dataset in Excel:

For question 1 you can give an excel formula, or if you are more comfortable using R or
python/pandas then imagine the dataset is a dataframe, and give R or python code. Make your
answer as simple and concise as possible.

1a) Give a formula to count the total number of items purchased by all female users.

Assumptions:

A column hosts User ID


B column hosts User Gender
C column hosts the item counts per customer
D column hosts the price paid for each customer

I have assumed the UserGender=’f’ stands for female and UserGender=’m’ stands for male too.

Ans 1a)

=SUM(SUMIF(B2:C10001,{"female","f"},C2:C10001))

1b) Revenue for a specific user is: items x price. Give a formula to calculate the sum of revenue
for all female users.

Ans 1b)

=SUMPRODUCT(C2:C10001,D2:D10001*((B2:B10001="female")+(B2:B10001="f")))

2) Now imagine the same data is in a table in a sql database (whichever SQL system you prefer):
2a) Give a sql query to count the total number of items purchased by all female users.
Assumption: Let this dataset be named as dataset in the SQL database.

The column names are: User_ID ,UserGender ,Items ,Price respectively

Ans. 2a)

select sum(items) as ItemCount from dataset where userGender


in ('female','f');

2b) Give a sql query to calculate the sum of revenue, grouped by gender.

Ans. 2b)

select sum(case when userGender in ('female','f') then


price*items else 0 end) as revenue_female,
sum(case when userGender in('male','m') then
price*items else 0 end) as revenue_male
from dataset;

or

select userGender,sum(price*items) as Revenue from dataset


group by userGender;

Case study question:

Imagine you are in charge of a data blog for Whatsapp and have access to any type
of data the messenger app has. What would be the 3-5 interesting facts about
Whatsapp user behaviour you would write about? What kind of data would you use
for each insight and how would you approach it?

Ans.)
Following are the key interesting facts about WhatsApp user behaviour and the kind
of data to be used for each insight. Data type to be considered is also mentioned
along with each insight:

1) User demographic profile: What is the age, gender, location of the User and
distribution of these demographics with respect to their activities or “online time”
on WhatsApp. This will help us garner targeted campaigns for these users during
their active time thereby increasing visibility towards our posts.
Activity time can be estimated from the timestamp data which gets stored
whenever a user logs into the App. The total number of timestamps along with the
duration of each login will let us create an estimated activity duration and frequency
of usage for a particular user. Latitude & longitude of the user can give us his/her
precise location for storing in our system
2) Usage Pattern: Understanding the usage pattern of the user and thereby finding
his/her online time periods during the day, fluctuation of usage during weekdays vs.
weekends and keeping track of his/her active online period.
Data used would be timestamps over a particular time period for each user.
This data will help us understand the User’s periodic consumption pattern related to
the App which will help us characterise the customer in numerous segments like
always online/daily/frequent/not so frequent.

3) User Identity Creation with Sentiment Analysis:


It has a 3 step approach to the problem:
Assumption: User data would be required for this in case that is not under regulation

This exercise will help us create an identity of the User, his/her behaviour traits as
well as based on his identity we would be able to understand whether he is an
influencer/follower/leader i.e. His/her personality trait
 Identifying the topic of discussion
 Measuring the sentiment on the topic and trying to identify why such a
sentiment might have been triggered
 Predictive Analytics : Based on the past historical sentiment analysis if a
person we can create his or her “identity” and hence predict how he will
react to situations or discussions in the future

4) Assessing exposure to campaigns: Without reading the User’s messages one can
assess his or her reaction to campaigns like say election results or any viral
messaging originating in a particular geographical area. This can give us “Affinity
Quotient” of the user which would measure the vulnerability as well as participation
of the user with respect to events around him.
This can be done on past data pertaining to an event or campaign at a particular
geographical location. The activity times of the users in the vicinity and a
comparative study on the same can give us estimate on the affinity of the User in
relation to any event.

5) Hotspot Analysis: It would be interesting to understand the social network or


reach of a particular user. If a test message is sent to one user then the number of
people the same message is conveyed to can give the answer to his or her social
reach.

This can help us identify social networks which are instrumental in opinion formation
or in dissemination of information

6) Fraud Analytics:
Assumption: User data would be required for this in case that is not under regulation
Using text mining we can find usage of certain keywords to be more prevalent for
particular users through which we can tag them to be high risk and avoid considering
them into our key user base.

Using the User Demographic profile and Usage pattern we can create campaigns
based on demographics and send these to the user in their most “Active online
periods”.

Through Identity creation and Hotspot analysis we can tag users which are potential
influencers/followers/leaders and key opinion makers for our business. Hotspot data
will through light on the networking capability of each User. This helps us focus our
campaigns and offers on particular segments of Users as well as ensure that the
goodwill has potential to reach to numerous connections across their networks. We
can also assess their exposure to campaigns and figure out our high risk users to be
kept away from our campaign engine.

You might also like