Interview Pract Questions

Here are some practical SQL interview questions focusing on JOIN operations that
test both basic and advanced knowledge:
Basic Join Questions

Inner Join Basics:
Question: Given two tables Employees and Departments, where Employees has columns
EmpID, EmpName, and DeptID, and Departments has columns DeptID and DeptName, write
a query to find the names of all employees along with their department names.
Expected Output: The query should return all employees and the department they
belong to.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
INNER JOIN
Departments d ON e.DeptID = d.DeptID;
Left Join Basics:
Question: Write a SQL query to find all employees and their respective departments.
If an employee does not belong to any department, display NULL for the department
name.
Expected Output: List of all employees with their department names, including those
with no department.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
LEFT JOIN
Right Join Basics:
Question: Write a SQL query to display all departments and the employees in each
department. Include departments that have no employees.
Expected Output: All departments, showing employee names where applicable, and NULL
for departments without employees.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
RIGHT JOIN
Full Outer Join Basics:
Question: Write a SQL query to list all employees and departments. Include
employees without departments and departments without employees.
Expected Output: List of all employees and departments, including those without a
matching pair.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
FULL OUTER JOIN
Intermediate Join Questions
Self Join:
Question: Consider an Employees table with columns EmpID, EmpName, and ManagerID,
where ManagerID refers to the EmpID of the employee’s manager. Write a query to
find all employees along with their manager names.
Expected Output: Employee names with their respective manager names.
Answer:
sql
Copy code
SELECT
e1.EmpName AS Employee,
e2.EmpName AS Manager
FROM
Employees e1
LEFT JOIN
Employees e2 ON e1.ManagerID = e2.EmpID;
Join with Aggregate Function:
Question: Write a SQL query to find the total sales amount for each customer from
two tables: Orders with columns OrderID, CustomerID, and Amount, and Customers with
columns CustomerID and CustomerName.
Expected Output: Customer name along with their total sales amount.
Answer:
sql
Copy code
SELECT
c.CustomerName,
SUM(o.Amount) AS TotalSales
FROM
Customers c
INNER JOIN
Orders o ON c.CustomerID = o.CustomerID
GROUP BY
c.CustomerName;
Join with WHERE Clause:
Question: Given two tables Products and OrderDetails, where Products has columns
ProductID, ProductName, and Price, and OrderDetails has columns OrderID, ProductID,
and Quantity, write a query to find the total revenue for each product where the
total revenue exceeds $500.
Expected Output: Product names with their total revenue, filtering those above
$500.
Answer:
sql
Copy code
SELECT
p.ProductName,
SUM(od.Quantity * p.Price) AS TotalRevenue
FROM
Products p
INNER JOIN
OrderDetails od ON p.ProductID = od.ProductID
GROUP BY
p.ProductName
HAVING
SUM(od.Quantity * p.Price) > 500;
Advanced Join Questions
Join with Subquery:
Question: Write a query to find the employees who have not placed any orders. You
have two tables, Employees and Orders, where Orders has columns OrderID,
EmployeeID, and OrderDate.
Expected Output: Employee names who haven't placed any orders.
Answer:
sql
Copy code
SELECT
e.EmpName
FROM
Employees e
LEFT JOIN
Orders o ON e.EmpID = o.EmployeeID
WHERE
o.OrderID IS NULL;
Multiple Joins:
Question: Write a query to find all orders along with customer names and product
names. You have three tables: Orders, Customers, and Products. Orders has columns
OrderID, CustomerID, and ProductID.
Expected Output: A list of all orders with the corresponding customer and product
names.
Answer:
sql
Copy code
SELECT
o.OrderID,
c.CustomerName,
p.ProductName
FROM
Orders o
INNER JOIN
Customers c ON o.CustomerID = c.CustomerID
INNER JOIN
Products p ON o.ProductID = p.ProductID;
Self Join with Aggregate:
Question: Write a query to find the top 3 employees with the highest number of
direct reports. You have an Employees table with columns EmpID and ManagerID.
Expected Output: Employee names with the number of direct reports, listing only the
top 3.
Answer:
sql
Copy code
SELECT
e1.EmpName,
COUNT(e2.EmpID) AS NumberOfReports
FROM
Employees e1
LEFT JOIN
Employees e2 ON e1.EmpID = e2.ManagerID
GROUP BY
e1.EmpName
ORDER BY
NumberOfReports DESC
LIMIT 3;
These questions are designed to test a range of SQL join operations, from basic
understanding to more complex scenarios, providing a comprehensive assessment of
the candidate's SQL skills.
--------------------------------------
import org.apache.spark.sql.functions.{array_contains,col}
import org.apache.spark.sql.types.{ArrayType, StringType, StructType}
import org.apache.spark.sql.{Row, SparkSession}
val data = Seq(
Row("James","Smith","Smithabc@abc.com,James.Smith@abc.com,Smith@abc.com","CA"),
Row("Michael","Rose","Mic@abc.com,Mic.Rose@abc.com,Roseabc@abc.com","NJ"),
Row("Robert","Williams","Rogger@abc.com,Robert.Williams@abc.com,Williamsabc@abc.com
","NV"),
Row("Robert","Junior",null,"NV"),
Row(null,"James","test@abc.com,helpdesk@abc.com","NV"),
Row("James",null,null,"NV"),
Row(null,"James","james@abc.com","NV"),
Row("James",null,"james@abc.com,helpdesk@abc.com","NV"),
Row("Willium","Junior","james@abc.com,helpdesk@abc.com,Wjunior@abc.com","NV")
)
val schema = new StructType()

.add("firstName",StringType)
.add("lastName",StringType)
.add("Emails", StringType)
.add("currentState", StringType)
val emailDf = spark.createDataFrame(spark.sparkContext.parallelize(data),schema)
1. if email is null return null
2. if only email return that
3. if there is no match return first email
4. if there is a match return first match and skip others
5. if first name or last name is null handle that properly
write simple function that take three inputs and return primary email(string) as
output
example :
def matchEmailAdd(emails: String, fname: String, lname: String):
email_list = emails.split(",")
if email_list is not null:
if len(email_list) > 1:
for each in email_list:
first_name, last_name = *each.split(".")
if fname == first_name or fname == last_name or lname ==
last_name or lname == first_name :
return each
else:
return email_list[0]
else:
return emails
else:
return null
def matchEmailAdd(emails: String, fname: String, lname: String): String = {}

email_list = emails.split(",") - [James.Smith@abc.com,Smith@abc.com]
if len(email_list) > 1:
for each in email_list:

#count = 0
email_update = email_list.pop(each)
[James.Smith@abc.com,Smith@abc.com]
if each in email_update:
yeild each
else:
yeild email_list[0]
break;
else:
---------------------------------------------------------------------------
csv with 1000 records.
file1 : 1000 records
file 2: 200 records(100 updated & 100 New)
schema : custId, custFname, custLname, custEmail, custMobile, custZipCode,
custAreaCode, custRegion
Condition: there is no
merge this two datasets in spark to get latest and unique records.
df1 = spark.read.csv("file1)
df2 = spark.rad.csv("file2)
df_new = df2.join(df1, on=["custId"],how="left") --100
df_final = df1.join(df2, on=["custId"], how="anit-left") 1100
df_final2 = df_final.join(df_new, on=["custId"], how="anit-left") - 1000

df_final3 = df_final2.union(df_new, on["custId"]) -1200
------------------------------------------------------
Employee SQL table with below colums.

empID, empName, empDesig, empDeptName, empSalary, empLocation, empExp
find second highest salary for each department.
select * from ( selet department, row_number from employeee group by department

order by salary desc) as A where A.row_number =2;
------------------------------------------
from spark.sql.functions import udf
def string_count(col_name, ch):
count = 0
for each in col_name:
if each == ch:
count = count + 1
return count
df_count = udf(string_count, interger())
df = df.Withcolumn("stin", df_count("col"))
2022da04165@wilp.bits-pilani.ac.in
-------------------------------------------------------------------
student_id subject
1 A
1 B
1 C
1 D
2 A
2 D
2 E
2 G
3 E
3 F
3 D
3 C
4 A
4 C
4 G
4 F
output
student_id subject
1 A,B,C,D
2 A,D,E,G
3 E,F,D,C
4 A,C,G,F
select student_id,
case
when student_id = 1 then concat(
val lst = List(1,9,2,7,3,4,11,5,14,6,8,12,15)
ls = [ [x, x+1, x+2] x+1,x+2 in lst for x in lst]
3 rd one
---------------------------------------------------------
hdfs dfs -put/get file_name < path to hdfs>

hdfs dfs -copyToLocal
data1, data2
hdfs dfs -ls path/data* | head -5
employee
count based on location
select count(employee),location from employee group by location;
1 - hyd
2- hyd
3 -pune
select * from (select location,row_numer() as num from employee group by location
order by sal) as A where A.num <3;
select emp, sal from employee as A left join select emp, sal from employye as B
where A.emp != B.emp and A.sal = B.sal;
df = df.sele().distinc()
2 interview
----------------------------------------------------------------------------
df.read
df.filter
df.join
df.write
tasks - 4
stages - 3
jobs -
number of stages and jobs
----------------------------------------------------------------------
-- +--------+----------+---------------------+
-- | userid | activity | activitytime |
-- +--------+----------+---------------------+
-- | 1 | Login | 2024-08-08 08:00:00 |
-- | 1 | Logout | 2024-08-08 12:00:00 |
-- | 1 | Login | 2024-08-08 13:00:00 |
-- | 1 | Logout | 2024-08-08 17:00:00 |
-- | 2 | Login | 2024-08-08 09:00:00 |
-- | 2 | Logout | 2024-08-08 11:00:00 |
-- | 2 | Login | 2024-08-08 14:00:00 |
-- | 2 | Logout | 2024-08-08 18:00:00 |
-- | 3 | Login | 2024-08-08 08:00:00 |
-- | 3 | Logout | 2024-08-08 12:30:00 |
-- +--------+----------+---------------------+
-- output:
-- +--------+--------------+
-- | userid | total_time |
-- +--------+--------------+
-- | 1 | 8.0000000000 |
-- | 2 | 6.0000000000 |
-- | 3 | 4.3000000000 |
-- +--------+--------------+
1 -
Least(logout) - Least(login)
-----------------------------------------------------------------
CustName Products_purchased
Amit Prod_1, Prod_2, Prod_1
Sachin Prod_A, Prod_B, Prod_2
Sonali Prod_1, Prod_B, Prod_2
output:
Product cust_count
Prod_1 2
Prod_2 3
Prod_A 1
Prod_B 2
explore,
final_prodcuts = []
con = df(products_purchased).to_list()
for each in con:
if each not in final_products:
final_products.append(each)
df = createDataframe(final_prodcuts)
df = df.withColumn('cust_count'
when(
---------------------------------------------------

Interview Pract Questions

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Interview Pract Questions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Interview Pract Questions

Uploaded by

Copyright:

Available Formats

Here are some practical SQL interview questions focusing on JOIN operations that

test both basic and advanced knowledge:

Basic Join Questions

val schema = new StructType()

val emailDf = spark.createDataFrame(spark.sparkContext.parallelize(data),schema)

1. if email is null return null

2. if only email return that

3. if there is no match return first email

4. if there is a match return first match and skip others

5. if first name or last name is null handle that properly

def matchEmailAdd(emails: String, fname: String, lname: String):

def matchEmailAdd(emails: String, fname: String, lname: String): String = {}

for each in email_list:

df_new = df2.join(df1, on=["custId"],how="left") --100

df_final = df1.join(df2, on=["custId"], how="anit-left") 1100

df_final2 = df_final.join(df_new, on=["custId"], how="anit-left") - 1000

Employee SQL table with below colums.

select * from ( selet department, row_number from employeee group by department

def string_count(col_name, ch):

df_count = udf(string_count, interger())

val lst = List(1,9,2,7,3,4,11,5,14,6,8,12,15)

ls = [ [x, x+1, x+2] x+1,x+2 in lst for x in lst]

hdfs dfs -put/get file_name < path to hdfs>

hdfs dfs -ls path/data* | head -5

count based on location

select count(employee),location from employee group by location;

number of stages and jobs

You might also like