Interview Pract Questions
Interview Pract Questions
Interview Pract Questions
Question: Given two tables Employees and Departments, where Employees has columns
EmpID, EmpName, and DeptID, and Departments has columns DeptID and DeptName, write
a query to find the names of all employees along with their department names.
Expected Output: The query should return all employees and the department they
belong to.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
INNER JOIN
Departments d ON e.DeptID = d.DeptID;
Left Join Basics:
Question: Write a SQL query to find all employees and their respective departments.
If an employee does not belong to any department, display NULL for the department
name.
Expected Output: List of all employees with their department names, including those
with no department.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
LEFT JOIN
Departments d ON e.DeptID = d.DeptID;
Right Join Basics:
Question: Write a SQL query to display all departments and the employees in each
department. Include departments that have no employees.
Expected Output: All departments, showing employee names where applicable, and NULL
for departments without employees.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
RIGHT JOIN
Departments d ON e.DeptID = d.DeptID;
Full Outer Join Basics:
Question: Write a SQL query to list all employees and departments. Include
employees without departments and departments without employees.
Expected Output: List of all employees and departments, including those without a
matching pair.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
FULL OUTER JOIN
Departments d ON e.DeptID = d.DeptID;
Intermediate Join Questions
Self Join:
Question: Consider an Employees table with columns EmpID, EmpName, and ManagerID,
where ManagerID refers to the EmpID of the employee’s manager. Write a query to
find all employees along with their manager names.
Expected Output: Employee names with their respective manager names.
Answer:
sql
Copy code
SELECT
e1.EmpName AS Employee,
e2.EmpName AS Manager
FROM
Employees e1
LEFT JOIN
Employees e2 ON e1.ManagerID = e2.EmpID;
Join with Aggregate Function:
Question: Write a SQL query to find the total sales amount for each customer from
two tables: Orders with columns OrderID, CustomerID, and Amount, and Customers with
columns CustomerID and CustomerName.
Expected Output: Customer name along with their total sales amount.
Answer:
sql
Copy code
SELECT
c.CustomerName,
SUM(o.Amount) AS TotalSales
FROM
Customers c
INNER JOIN
Orders o ON c.CustomerID = o.CustomerID
GROUP BY
c.CustomerName;
Join with WHERE Clause:
Question: Given two tables Products and OrderDetails, where Products has columns
ProductID, ProductName, and Price, and OrderDetails has columns OrderID, ProductID,
and Quantity, write a query to find the total revenue for each product where the
total revenue exceeds $500.
Expected Output: Product names with their total revenue, filtering those above
$500.
Answer:
sql
Copy code
SELECT
p.ProductName,
SUM(od.Quantity * p.Price) AS TotalRevenue
FROM
Products p
INNER JOIN
OrderDetails od ON p.ProductID = od.ProductID
GROUP BY
p.ProductName
HAVING
SUM(od.Quantity * p.Price) > 500;
Advanced Join Questions
Join with Subquery:
Question: Write a query to find the employees who have not placed any orders. You
have two tables, Employees and Orders, where Orders has columns OrderID,
EmployeeID, and OrderDate.
Expected Output: Employee names who haven't placed any orders.
Answer:
sql
Copy code
SELECT
e.EmpName
FROM
Employees e
LEFT JOIN
Orders o ON e.EmpID = o.EmployeeID
WHERE
o.OrderID IS NULL;
Multiple Joins:
Question: Write a query to find all orders along with customer names and product
names. You have three tables: Orders, Customers, and Products. Orders has columns
OrderID, CustomerID, and ProductID.
Expected Output: A list of all orders with the corresponding customer and product
names.
Answer:
sql
Copy code
SELECT
o.OrderID,
c.CustomerName,
p.ProductName
FROM
Orders o
INNER JOIN
Customers c ON o.CustomerID = c.CustomerID
INNER JOIN
Products p ON o.ProductID = p.ProductID;
Self Join with Aggregate:
Question: Write a query to find the top 3 employees with the highest number of
direct reports. You have an Employees table with columns EmpID and ManagerID.
Expected Output: Employee names with the number of direct reports, listing only the
top 3.
Answer:
sql
Copy code
SELECT
e1.EmpName,
COUNT(e2.EmpID) AS NumberOfReports
FROM
Employees e1
LEFT JOIN
Employees e2 ON e1.EmpID = e2.ManagerID
GROUP BY
e1.EmpName
ORDER BY
NumberOfReports DESC
LIMIT 3;
These questions are designed to test a range of SQL join operations, from basic
understanding to more complex scenarios, providing a comprehensive assessment of
the candidate's SQL skills.
--------------------------------------
import org.apache.spark.sql.functions.{array_contains,col}
import org.apache.spark.sql.types.{ArrayType, StringType, StructType}
import org.apache.spark.sql.{Row, SparkSession}
val data = Seq(
Row("James","Smith","Smithabc@abc.com,James.Smith@abc.com,Smith@abc.com","CA"),
Row("Michael","Rose","Mic@abc.com,Mic.Rose@abc.com,Roseabc@abc.com","NJ"),
Row("Robert","Williams","Rogger@abc.com,Robert.Williams@abc.com,Williamsabc@abc.com
","NV"),
Row("Robert","Junior",null,"NV"),
Row(null,"James","test@abc.com,helpdesk@abc.com","NV"),
Row("James",null,null,"NV"),
Row(null,"James","james@abc.com","NV"),
Row("James",null,"james@abc.com,helpdesk@abc.com","NV"),
Row("Willium","Junior","james@abc.com,helpdesk@abc.com,Wjunior@abc.com","NV")
)
write simple function that take three inputs and return primary email(string) as
output
example :
email_list = emails.split(",")
if email_list is not null:
if len(email_list) > 1:
for each in email_list:
first_name, last_name = *each.split(".")
if fname == first_name or fname == last_name or lname ==
last_name or lname == first_name :
return each
else:
return email_list[0]
else:
return emails
else:
return null
---------------------------------------------------------------------------
csv with 1000 records.
file1 : 1000 records
file 2: 200 records(100 updated & 100 New)
schema : custId, custFname, custLname, custEmail, custMobile, custZipCode,
custAreaCode, custRegion
Condition: there is no
merge this two datasets in spark to get latest and unique records.
df1 = spark.read.csv("file1)
df2 = spark.rad.csv("file2)
------------------------------------------------------
------------------------------------------
from spark.sql.functions import udf
count = 0
for each in col_name:
if each == ch:
count = count + 1
return count
df = df.Withcolumn("stin", df_count("col"))
2022da04165@wilp.bits-pilani.ac.in
-------------------------------------------------------------------
student_id subject
1 A
1 B
1 C
1 D
2 A
2 D
2 E
2 G
3 E
3 F
3 D
3 C
4 A
4 C
4 G
4 F
output
student_id subject
1 A,B,C,D
2 A,D,E,G
3 E,F,D,C
4 A,C,G,F
select student_id,
case
when student_id = 1 then concat(
3 rd one
---------------------------------------------------------
data1, data2
employee
1 - hyd
2- hyd
3 -pune
select * from (select location,row_numer() as num from employee group by location
order by sal) as A where A.num <3;
select emp, sal from employee as A left join select emp, sal from employye as B
where A.emp != B.emp and A.sal = B.sal;
df = df.sele().distinc()
2 interview
----------------------------------------------------------------------------
df.read
df.filter
df.join
df.write
tasks - 4
stages - 3
jobs -
----------------------------------------------------------------------
-- +--------+----------+---------------------+
-- | userid | activity | activitytime |
-- +--------+----------+---------------------+
-- | 1 | Login | 2024-08-08 08:00:00 |
-- | 1 | Logout | 2024-08-08 12:00:00 |
-- | 1 | Login | 2024-08-08 13:00:00 |
-- | 1 | Logout | 2024-08-08 17:00:00 |
-- | 2 | Login | 2024-08-08 09:00:00 |
-- | 2 | Logout | 2024-08-08 11:00:00 |
-- | 2 | Login | 2024-08-08 14:00:00 |
-- | 2 | Logout | 2024-08-08 18:00:00 |
-- | 3 | Login | 2024-08-08 08:00:00 |
-- | 3 | Logout | 2024-08-08 12:30:00 |
-- +--------+----------+---------------------+
-- output:
-- +--------+--------------+
-- | userid | total_time |
-- +--------+--------------+
-- | 1 | 8.0000000000 |
-- | 2 | 6.0000000000 |
-- | 3 | 4.3000000000 |
-- +--------+--------------+
1 -
Least(logout) - Least(login)
-----------------------------------------------------------------
CustName Products_purchased
Amit Prod_1, Prod_2, Prod_1
Sachin Prod_A, Prod_B, Prod_2
Sonali Prod_1, Prod_B, Prod_2
output:
Product cust_count
Prod_1 2
Prod_2 3
Prod_A 1
Prod_B 2
explore,
final_prodcuts = []
con = df(products_purchased).to_list()
for each in con:
if each not in final_products:
final_products.append(each)
df = createDataframe(final_prodcuts)
df = df.withColumn('cust_count'
when(
---------------------------------------------------