0% found this document useful (0 votes)
7 views10 pages

Interview Pract Questions

Download as txt, pdf, or txt
Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1/ 10

Here are some practical SQL interview questions focusing on JOIN operations that

test both basic and advanced knowledge:

Basic Join Questions


Inner Join Basics:

Question: Given two tables Employees and Departments, where Employees has columns
EmpID, EmpName, and DeptID, and Departments has columns DeptID and DeptName, write
a query to find the names of all employees along with their department names.
Expected Output: The query should return all employees and the department they
belong to.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
INNER JOIN
Departments d ON e.DeptID = d.DeptID;
Left Join Basics:

Question: Write a SQL query to find all employees and their respective departments.
If an employee does not belong to any department, display NULL for the department
name.
Expected Output: List of all employees with their department names, including those
with no department.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
LEFT JOIN
Departments d ON e.DeptID = d.DeptID;
Right Join Basics:

Question: Write a SQL query to display all departments and the employees in each
department. Include departments that have no employees.
Expected Output: All departments, showing employee names where applicable, and NULL
for departments without employees.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
RIGHT JOIN
Departments d ON e.DeptID = d.DeptID;
Full Outer Join Basics:

Question: Write a SQL query to list all employees and departments. Include
employees without departments and departments without employees.
Expected Output: List of all employees and departments, including those without a
matching pair.
Answer:
sql
Copy code
SELECT
e.EmpName,
d.DeptName
FROM
Employees e
FULL OUTER JOIN
Departments d ON e.DeptID = d.DeptID;
Intermediate Join Questions
Self Join:

Question: Consider an Employees table with columns EmpID, EmpName, and ManagerID,
where ManagerID refers to the EmpID of the employee’s manager. Write a query to
find all employees along with their manager names.
Expected Output: Employee names with their respective manager names.
Answer:
sql
Copy code
SELECT
e1.EmpName AS Employee,
e2.EmpName AS Manager
FROM
Employees e1
LEFT JOIN
Employees e2 ON e1.ManagerID = e2.EmpID;
Join with Aggregate Function:

Question: Write a SQL query to find the total sales amount for each customer from
two tables: Orders with columns OrderID, CustomerID, and Amount, and Customers with
columns CustomerID and CustomerName.
Expected Output: Customer name along with their total sales amount.
Answer:
sql
Copy code
SELECT
c.CustomerName,
SUM(o.Amount) AS TotalSales
FROM
Customers c
INNER JOIN
Orders o ON c.CustomerID = o.CustomerID
GROUP BY
c.CustomerName;
Join with WHERE Clause:

Question: Given two tables Products and OrderDetails, where Products has columns
ProductID, ProductName, and Price, and OrderDetails has columns OrderID, ProductID,
and Quantity, write a query to find the total revenue for each product where the
total revenue exceeds $500.
Expected Output: Product names with their total revenue, filtering those above
$500.
Answer:
sql
Copy code
SELECT
p.ProductName,
SUM(od.Quantity * p.Price) AS TotalRevenue
FROM
Products p
INNER JOIN
OrderDetails od ON p.ProductID = od.ProductID
GROUP BY
p.ProductName
HAVING
SUM(od.Quantity * p.Price) > 500;
Advanced Join Questions
Join with Subquery:

Question: Write a query to find the employees who have not placed any orders. You
have two tables, Employees and Orders, where Orders has columns OrderID,
EmployeeID, and OrderDate.
Expected Output: Employee names who haven't placed any orders.
Answer:
sql
Copy code
SELECT
e.EmpName
FROM
Employees e
LEFT JOIN
Orders o ON e.EmpID = o.EmployeeID
WHERE
o.OrderID IS NULL;
Multiple Joins:

Question: Write a query to find all orders along with customer names and product
names. You have three tables: Orders, Customers, and Products. Orders has columns
OrderID, CustomerID, and ProductID.
Expected Output: A list of all orders with the corresponding customer and product
names.
Answer:
sql
Copy code
SELECT
o.OrderID,
c.CustomerName,
p.ProductName
FROM
Orders o
INNER JOIN
Customers c ON o.CustomerID = c.CustomerID
INNER JOIN
Products p ON o.ProductID = p.ProductID;
Self Join with Aggregate:

Question: Write a query to find the top 3 employees with the highest number of
direct reports. You have an Employees table with columns EmpID and ManagerID.
Expected Output: Employee names with the number of direct reports, listing only the
top 3.
Answer:
sql
Copy code
SELECT
e1.EmpName,
COUNT(e2.EmpID) AS NumberOfReports
FROM
Employees e1
LEFT JOIN
Employees e2 ON e1.EmpID = e2.ManagerID
GROUP BY
e1.EmpName
ORDER BY
NumberOfReports DESC
LIMIT 3;
These questions are designed to test a range of SQL join operations, from basic
understanding to more complex scenarios, providing a comprehensive assessment of
the candidate's SQL skills.

--------------------------------------
import org.apache.spark.sql.functions.{array_contains,col}
import org.apache.spark.sql.types.{ArrayType, StringType, StructType}
import org.apache.spark.sql.{Row, SparkSession}
val data = Seq(
Row("James","Smith","Smithabc@abc.com,James.Smith@abc.com,Smith@abc.com","CA"),
Row("Michael","Rose","Mic@abc.com,Mic.Rose@abc.com,Roseabc@abc.com","NJ"),

Row("Robert","Williams","Rogger@abc.com,Robert.Williams@abc.com,Williamsabc@abc.com
","NV"),
Row("Robert","Junior",null,"NV"),
Row(null,"James","test@abc.com,helpdesk@abc.com","NV"),
Row("James",null,null,"NV"),
Row(null,"James","james@abc.com","NV"),
Row("James",null,"james@abc.com,helpdesk@abc.com","NV"),
Row("Willium","Junior","james@abc.com,helpdesk@abc.com,Wjunior@abc.com","NV")
)

val schema = new StructType()


.add("firstName",StringType)
.add("lastName",StringType)
.add("Emails", StringType)
.add("currentState", StringType)

val emailDf = spark.createDataFrame(spark.sparkContext.parallelize(data),schema)

1. if email is null return null

2. if only email return that

3. if there is no match return first email

4. if there is a match return first match and skip others

5. if first name or last name is null handle that properly

write simple function that take three inputs and return primary email(string) as
output

example :

def matchEmailAdd(emails: String, fname: String, lname: String):

email_list = emails.split(",")
if email_list is not null:
if len(email_list) > 1:
for each in email_list:
first_name, last_name = *each.split(".")
if fname == first_name or fname == last_name or lname ==
last_name or lname == first_name :
return each
else:
return email_list[0]
else:
return emails

else:
return null

def matchEmailAdd(emails: String, fname: String, lname: String): String = {}


email_list = emails.split(",") - [James.Smith@abc.com,Smith@abc.com]
if len(email_list) > 1:

for each in email_list:


#count = 0
email_update = email_list.pop(each)
[James.Smith@abc.com,Smith@abc.com]
if each in email_update:
yeild each
else:
yeild email_list[0]
break;
else:

---------------------------------------------------------------------------
csv with 1000 records.
file1 : 1000 records
file 2: 200 records(100 updated & 100 New)
schema : custId, custFname, custLname, custEmail, custMobile, custZipCode,
custAreaCode, custRegion
Condition: there is no
merge this two datasets in spark to get latest and unique records.

df1 = spark.read.csv("file1)
df2 = spark.rad.csv("file2)

df_new = df2.join(df1, on=["custId"],how="left") --100

df_final = df1.join(df2, on=["custId"], how="anit-left") 1100

df_final2 = df_final.join(df_new, on=["custId"], how="anit-left") - 1000


df_final3 = df_final2.union(df_new, on["custId"]) -1200

------------------------------------------------------

Employee SQL table with below colums.


empID, empName, empDesig, empDeptName, empSalary, empLocation, empExp
find second highest salary for each department.

select * from ( selet department, row_number from employeee group by department


order by salary desc) as A where A.row_number =2;

------------------------------------------
from spark.sql.functions import udf

def string_count(col_name, ch):

count = 0
for each in col_name:
if each == ch:
count = count + 1
return count

df_count = udf(string_count, interger())

df = df.Withcolumn("stin", df_count("col"))

2022da04165@wilp.bits-pilani.ac.in
-------------------------------------------------------------------

student_id subject

1 A

1 B

1 C

1 D

2 A

2 D

2 E

2 G

3 E

3 F

3 D
3 C

4 A

4 C

4 G

4 F

output

student_id subject

1 A,B,C,D

2 A,D,E,G

3 E,F,D,C

4 A,C,G,F

select student_id,
case
when student_id = 1 then concat(

val lst = List(1,9,2,7,3,4,11,5,14,6,8,12,15)

ls = [ [x, x+1, x+2] x+1,x+2 in lst for x in lst]

3 rd one

---------------------------------------------------------

hdfs dfs -put/get file_name < path to hdfs>


hdfs dfs -copyToLocal

data1, data2

hdfs dfs -ls path/data* | head -5

employee

count based on location

select count(employee),location from employee group by location;

1 - hyd
2- hyd
3 -pune
select * from (select location,row_numer() as num from employee group by location
order by sal) as A where A.num <3;

select emp, sal from employee as A left join select emp, sal from employye as B
where A.emp != B.emp and A.sal = B.sal;

df = df.sele().distinc()

2 interview

----------------------------------------------------------------------------

df.read

df.filter

df.join

df.write

tasks - 4

stages - 3

jobs -

number of stages and jobs

----------------------------------------------------------------------

-- +--------+----------+---------------------+
-- | userid | activity | activitytime |
-- +--------+----------+---------------------+
-- | 1 | Login | 2024-08-08 08:00:00 |
-- | 1 | Logout | 2024-08-08 12:00:00 |
-- | 1 | Login | 2024-08-08 13:00:00 |
-- | 1 | Logout | 2024-08-08 17:00:00 |
-- | 2 | Login | 2024-08-08 09:00:00 |
-- | 2 | Logout | 2024-08-08 11:00:00 |
-- | 2 | Login | 2024-08-08 14:00:00 |
-- | 2 | Logout | 2024-08-08 18:00:00 |
-- | 3 | Login | 2024-08-08 08:00:00 |
-- | 3 | Logout | 2024-08-08 12:30:00 |
-- +--------+----------+---------------------+

-- output:
-- +--------+--------------+
-- | userid | total_time |
-- +--------+--------------+
-- | 1 | 8.0000000000 |
-- | 2 | 6.0000000000 |
-- | 3 | 4.3000000000 |
-- +--------+--------------+

1 -

Least(logout) - Least(login)
-----------------------------------------------------------------

CustName Products_purchased
Amit Prod_1, Prod_2, Prod_1
Sachin Prod_A, Prod_B, Prod_2
Sonali Prod_1, Prod_B, Prod_2

output:
Product cust_count
Prod_1 2
Prod_2 3
Prod_A 1
Prod_B 2

explore,

final_prodcuts = []
con = df(products_purchased).to_list()
for each in con:
if each not in final_products:
final_products.append(each)

df = createDataframe(final_prodcuts)

df = df.withColumn('cust_count'
when(

---------------------------------------------------

You might also like