Data Science Edited

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 57

Data Analyst Interview Questions: SQL

RDBMS is one of the most commonly used databases till date, and therefore SQL
skills are indispensable in most of the job roles such as a Data Analyst. Knowing
Structured Query Language, boots your path on becoming a data analyst, as it will be
clear in your interviews that you know how to handle databases. 

Q1. What is the default port for SQL?

The default TCP port assigned by the official Internet Number Authority(IANA) for SQL
server is 1433.

Q2. What do you mean by DBMS? What are its different types?

A Database Management System (DBMS) is a software application that interacts with


the user, applications and the database itself to capture and analyze data. The data
stored in the database can be modified, retrieved and deleted, and can be of any type
like strings, numbers, images etc.

There are mainly 4 types of DBMS, which are Hierarchical, Relational, Network, and
Object-Oriented DBMS.

 Hierarchical DBMS:  As the name suggests, this type of DBMS has a style of
predecessor-successor type of relationship. So, it has a structure similar to that
of a tree, wherein the nodes represent records and the branches of the tree
represent fields.
 Relational DBMS (RDBMS): This type of DBMS, uses a structure that allows the
users to identify and access data in relation to another piece of data in the
database.
 Network DBMS: This type of DBMS supports many to many relations wherein
multiple member records can be linked.
 Object-oriented DBMS: This type of DBMS uses small individual software called
objects. Each object contains a piece of data and the instructions for the actions
to be done with the data.

1|Page
Q3. What is ACID property in a database?
ACID is an acronym for Atomicity, Consistency, Isolation, and Durability. This property
is used in the databases to ensure whether the data transactions are processed reliably
in the system or not. If you have to define each of these terms, then you can refer
below.

 Atomicity: Refers to the transactions which are either completely successful or


failed. Here a transaction refers to a single operation. So, even if a single
transaction fails, then the entire transaction fails and the database state is left
unchanged.
 Consistency: This feature makes sure that the data must meet all the validation
rules. So, this basically makes sure that the transaction never leaves the
database without completing its state.
 Isolation: Isolation keeps transactions separated from each other until they’re
finished. So basically each and every transaction is independent. 
 Durability: Durability makes sure that your committed transaction is never lost.
So, this guarantees that the database will keep track of pending changes in such
a way that even if there is a power loss, crash or any sort of error the server can
recover from an abnormal termination.

Q4. What is Normalization? Explain different types of Normalization with


advantages.

Normalization is the process of organizing data to avoid duplication and


redundancy. There are many successive levels of normalization. These are
called normal forms. Each consecutive normal form depends on the previous one. The
first three normal forms are usually adequate.

Data Science Training

DATA SCIENCE AND MACHINE LEARNING INTERNSHIP PROGRAM


Data Science and Machine Learning Internship Program
Reviews
 5(10)

2|Page
DATA SCIENCE WITH PYTHON CERTIFICATION COURSE
Data Science with Python Certification Course
Reviews
 5(110114)

PYTHON CERTIFICATION TRAINING COURSE


Python Certification Training Course
Reviews
 5(38584)

PYTHON MACHINE LEARNING CERTIFICATION TRAINING


Python Machine Learning Certification Training
Reviews
 5(12918)

DATA SCIENCE WITH R PROGRAMMING CERTIFICATION TRAINING COURSE


Data Science with R Programming Certification Training Course
Reviews
 5(39433)

DATA ANALYTICS WITH R PROGRAMMING CERTIFICATION TRAINING


Data Analytics with R Programming Certification Training
Reviews
 5(25767)

STATISTICS ESSENTIALS FOR ANALYTICS


Statistics Essentials for Analytics
Reviews
 5(6319)

SAS TRAINING AND CERTIFICATION


SAS Training and Certification
Reviews
 5(5250)

ANALYTICS FOR RETAIL BANKS


Analytics for Retail Banks

3|Page
Reviews
 5(1539)

Next

 First Normal Form (1NF) – No repeating groups within rows


 Second Normal Form (2NF) – Every non-key (supporting) column value is
dependent on the whole primary key.
 Third Normal Form (3NF) – Dependent solely on the primary key and no other
non-key (supporting) column value.
 Boyce- Codd Normal Form (BCNF) –  BCNF is the advanced version of 3NF. A
table is said to be in BCNF if it is 3NF and for every X ->Y, relation X should be
the super key of the table.

Some of the advantages are:

 Better Database organization


 More Tables with smaller rows
 Efficient data access
 Greater Flexibility for Queries
 Quickly find the information
 Easier to implement Security
 Allows easy modification
 Reduction of redundant and duplicate data
 More Compact Database
 Ensure Consistent data after modification

Q5. What are the different types of Joins?


The various types of joins used to retrieve data between tables are Inner Join, Left Join,
Right Join and Full Outer Join. Refer to the image on the right side.

4|Page
 Inner join: Inner Join in MySQL is the most common type of join. It is used to
return all the rows from multiple tables where the join condition is satisfied. 
 Left Join:  Left Join in MySQL is used to return all the rows from the left table,
but only the matching rows from the right table where the join condition is fulfilled.
 Right Join: Right Join in MySQL is used to return all the rows from the right
table, but only the matching rows from the left table where the join condition is
fulfilled.
 Full Join: Full join returns all the records when there is a match in any of the
tables. Therefore, it returns all the rows from the left-hand side table and all the
rows from the right-hand side table.

Q6. Suppose you have a table of employee details consisting of columns


names (employeeId, employeeName), and you want to fetch alternate
records from a table. How do you think you can perform this task?
You can fetch alternate tuples by using the row number of the tuple. Let us say if we
want to display the employeeId, of even records, then you can use the mod function and
simply write the following query:

Select employeeId from (Select rownumber, employeeId from employee) where


1 mod(rownumber,2)=0
where ‘employee’ is the table name.

Similarly, if you want to display the employeeId of odd records, then you can write the
following query

Select employeeId from (Select rownumber, employeeId from employee) where


1 mod(rownumber ,2)=1
Q7. Consider the following two tables.

5|Page
Table 5: Example Table  – Data Analyst Interview Questions

Now, write a query to get the list of customers who took the course more
than once on the same day. The customers should be grouped by
customer, and course and the list should be ordered according to the
most recent date.

1 SELECT
c.Customer_Id,
2
CustomerName,
3 Course_Id,
4 Course_Date,
5 count(Customer_Course_Id) AS count
6 FROM customers c JOIN course_details d ON d.Customer_Id = c.Customer_Id
GROUP BY c.Customer_Id,
7 CustomerName,
8 Course_Id,
9 Course_Date
10 HAVING count( Customer_Course_Id ) > 1

6|Page
11
12 ORDER BY Course_Date DESC;
13

Table 6: Output Table  – Data Analyst Interview Questions


Q8. Consider the below Employee_Details table. Here the table has various features such as
Employee_Id, EmployeeName, Age, Gender, and Shift. The Shift has m = Morning Shift and e =
Evening Shift. Now, you have to swap the ‘m’ and the ‘e’ values and vice versa, with a single
update query. 

Table 7: Example Table  – Data Analyst Interview Questions

You can write the below query:

1 UPDATE Employee_Details SET Shift = CASE Shift WHEN 'm' THEN 'e' ELSE 'm' END

7|Page
Table 8: Output Table  – Data Analyst Interview Questions

Q9. Write a SQL query to get the third highest salary of an employee


from Employee_Details table as illustrated below.

Table 9: Example Table  – Data Analyst Interview Questions

1 SELECT TOP 1 Salary


2 FROM(
3 SELECT TOP 3 Salary
4 FROM Employee_Details
ORDER BY salary DESC) AS emp
5 ORDER BY salary ASC;
6

Q10. What is the difference between NVL and NVL2 functions in SQL?

NVL(exp1, exp2) and NVL2(exp1, exp2, exp3) are functions which check whether the
value of exp1 is null or not.

8|Page
If we use NVL(exp1,exp2) function, then if exp1 is not null, then the value of exp1 will be
returned; else the value of exp2 will be returned. But, exp2 must be of the same data
type of exp1.

Similarly, if we use NVL2(exp1, exp2, exp3) function, then if exp1 is not null, exp2 will
be returned, else the value of exp3 will be returned.

If you wish to know more questions on SQL, then refer a full-fledged article on SQL
Interview Questions.

SQL Interview Questions


1. What is the difference between SQL and MySQL?
2. What are the different subsets of SQL?

9|Page
3. What do you mean by DBMS? What are its different types?
4. What do you mean by table and field in SQL?
5. What are joins in SQL?
6. What is the difference between CHAR and VARCHAR2 datatype in SQL?
7. What is the Primary key?
8. What are Constraints?
9. What is the difference between DELETE and TRUNCATE statements?
10. What is a Unique key?
Q1. What is the difference between SQL and MySQL?
SQL vs MySQL

SQL MySQL

SQL is a standard language which stands


MySQL is a database management
for Structured Query Language based on
system.
the English language
SQL is the core of the relational database MySQL is an RDMS (Relational
which is used for accessing and managing Database Management System) such as
database SQL Server, Informix etc.
Q2. What are the different subsets of SQL?

 Data Definition Language (DDL) – It allows you to perform various operations on


the database such as CREATE, ALTER, and DELETE objects.
 Data Manipulation Language(DML) – It allows you to access and manipulate
data. It helps you to insert, update, delete and retrieve data from the database.
 Data Control Language(DCL) – It allows you to control access to the database.
Example – Grant, Revoke access permissions.
Q3. What do you mean by DBMS? What are its different types?

A Database Management System (DBMS) is a 


software application that interacts with the user, applications, and the database itself to
capture and analyze data. A database is a structured collection of data. 
A DBMS allows a user to interact with the database. The data stored in the database
can be modified, retrieved and deleted and can be of any type like strings, numbers,
images, etc.

10 | P a g e
There are two types of DBMS:

 Relational Database Management System: The data is stored in relations


(tables). Example – MySQL.
 Non-Relational Database Management System: There is no concept of relations,
tuples and attributes.  Example – MongoDB
Let’s move to the next question in this SQL Interview Questions.
Q4. What is RDBMS? How is it different from DBMS?
A relational database management system (RDBMS) is a set of applications and
features that allow IT professionals and others to develop, edit, administer, and interact
with relational databases. Most commercial relational database management systems
use Structured Query Language (SQL) to access the database, which is stored in the
form of tables.
The RDBMS is the most widely used database system in businesses all over the world.
It offers a stable means of storing and retrieving massive amounts of data.
Databases, in general, hold collections of data that may be accessed and used in other
applications. The development, administration, and use of database platforms are all
supported by a database management system.
A relational database management system (RDBMS) is a type of database
management system (DBMS) that stores data in a row-based table structure that links
related data components. An RDBMS contains functions that ensure the data’s security,
accuracy, integrity, and consistency. This is not the same as the file storage utilized by
a database management system.
The following are some further distinctions between database management systems
and relational database management systems:
The number of users who are permitted to utilise the system
A DBMS can only handle one user at a time, whereas an RDBMS can handle numerous
users.
Hardware and software specifications
In comparison to an RDBMS, a DBMS requires fewer software and hardware.
Amount of information
RDBMSes can handle any quantity of data, from tiny to enormous, whereas DBMSes
are limited to small amounts.
The structure of the database
Data is stored in a hierarchical format in a DBMS, whereas an RDBMS uses a table with
headers that serve as column names and rows that hold the associated values.
Implementation of the ACID principle
The atomicity, consistency, isolation, and durability (ACID) concept is not used by
DBMSs for data storage. RDBMSes, on the other hand, use the ACID model to
organize their data and assure consistency.
Databases that are distributed
A DBMS will not provide complete support for distributed databases, whereas an
RDBMS will.
Programs that are managed
A DBMS focuses on keeping databases that are present within the computer network
and system hard discs, whereas an RDBMS helps manage relationships between its

11 | P a g e
incorporated tables of data.
Normalization of databases is supported
A RDBMS can be normalized , but a DBMS cannot be normalized.
Q5. What is a Self-Join?
A self-join is a type of join that can be used to connect two tables. As a result, it is a
unary relationship. Each row of the table is attached to itself and all other rows of the
same table in a self-join. As a result, a self-join is mostly used to combine and compare
rows from the same database table.
Q6. What is the SELECT statement?
A SELECT command gets zero or more rows from one or more database tables or
views. The most frequent data manipulation language (DML) command is SELECT in
most applications. SELECT queries define a result set, but not how to calculate it,
because SQL is a declarative programming language.
Q7. What are some common clauses used with SELECT query in SQL?
The following are some frequent SQL clauses used in conjunction with a SELECT
query:
WHERE clause: In SQL, the WHERE clause is used to filter records that are required
depending on certain criteria.
ORDER BY clause: The ORDER BY clause in SQL is used to sort data in ascending
(ASC) or descending (DESC) order depending on specified field(s) (DESC).
GROUP BY clause: GROUP BY clause in SQL is used to group entries with identical
data and may be used with aggregation methods to obtain summarised database
results.
HAVING clause in SQL is used to filter records in combination with the GROUP BY
clause. It is different from WHERE, since the WHERE clause cannot filter aggregated
records.
Q8. What are UNION, MINUS and INTERSECT commands?
The UNION operator is used to combine the results of two tables while also removing
duplicate entries.
The MINUS operator is used to return rows from the first query but not from the second
query.
The INTERSECT operator is used to combine the results of both queries into a single
row.
Before running either of the above SQL statements, certain requirements must be
satisfied –
Within the clause, each SELECT query must have the same amount of columns.
The data types in the columns must also be comparable.
In each SELECT statement, the columns must be in the same order.
Q9. What is Cursor? How to use a Cursor?
After any variable declaration, DECLARE a cursor. A SELECT Statement must always
be coupled with the cursor definition.
To start the result set, move the cursor over it. Before obtaining rows from the result set,
the OPEN statement must be executed.
To retrieve and go to the next row in the result set, use the FETCH command.
To disable the cursor, use the CLOSE command.

12 | P a g e
Finally, use the DEALLOCATE command to remove the cursor definition and free up
the resources connected with it.
Q10. List the different types of relationships in SQL.
There are different types of relations in the database:
One-to-One – This is a connection between two tables in which each record in one
table corresponds to the maximum of one record in the other.
One-to-Many and Many-to-One – This is the most frequent connection, in which a
record in one table is linked to several records in another.
Many-to-Many – This is used when defining a relationship that requires several
instances on each sides.
Self-Referencing Relationships – When a table has to declare a connection with itself,
this is the method to employ.
Q12. What is OLTP?
OLTP, or online transactional processing, allows huge groups of people to execute
massive amounts of database transactions in real time, usually via the internet. A
database transaction occurs when data in a database is changed, inserted, deleted, or
queried.
Q13. What are the differences between OLTP and OLAP?
OLTP stands for online transaction processing, whereas OLAP stands for online
analytical processing. OLTP is an online database modification system, whereas OLAP
is an online database query response system.
Q14. How to create empty tables with the same structure as another
table?
To create empty tables:
Using the INTO operator to fetch the records of one table into a new table while setting
a WHERE clause to false for all entries, it is possible to create empty tables with the
same structure. As a result, SQL creates a new table with a duplicate structure to
accept the fetched entries, but nothing is stored into the new table since the WHERE
clause is active.
Q15. What is PostgreSQL?
In 1986, a team lead by Computer Science Professor Michael Stonebraker created
PostgreSQL under the name Postgres. It was created to aid developers in the
development of enterprise-level applications by ensuring data integrity and fault
tolerance in systems. PostgreSQL is an enterprise-level, versatile, resilient, open-
source, object-relational database management system that supports variable
workloads and concurrent users. The international developer community has constantly
backed it. PostgreSQL has achieved significant appeal among developers because to
its fault-tolerant characteristics.
It’s a very reliable database management system, with more than two decades of
community work to thank for its high levels of resiliency, integrity, and accuracy. Many
online, mobile, geospatial, and analytics applications utilise PostgreSQL as their primary
data storage or data warehouse.
Q16. What are SQL comments?
SQL Comments are used to clarify portions of SQL statements and to prevent SQL
statements from being executed. Comments are quite important in many programming
languages. The comments are not supported by a Microsoft Access database. As a
13 | P a g e
result, the Microsoft Access database is used in the examples in Mozilla Firefox and
Microsoft Edge.
Single Line Comments: It starts with two consecutive hyphens (–).
Multi-line Comments: It starts with /* and ends with */.
Q17. What is the usage of the NVL() function?
You may use the NVL function to replace null values with a default value. The function
returns the value of the second parameter if the first parameter is null. If the first
parameter is anything other than null, it is left alone.
This function is used in Oracle, not in SQL and MySQL. Instead of NVL() function,
MySQL have IFNULL() and SQL Server have ISNULL() function.
Let’s move to the next question in this SQL Interview Questions.
Q18. Explain character-manipulation functions? Explains its different
types in SQL.
Change, extract, and edit the character string using character manipulation routines.
The function will do its action on the input strings and return the result when one or
more characters and words are supplied into it.
The character manipulation functions in SQL are as follows:
A) CONCAT (joining two or more values): This function is used to join two or more
values together. The second string is always appended to the end of the first string.
B) SUBSTR: This function returns a segment of a string from a given start point to a
given endpoint.
C) LENGTH: This function returns the length of the string in numerical form, including
blank spaces.
D) INSTR: This function calculates the precise numeric location of a character or word
in a string.
E) LPAD: For right-justified values, it returns the padding of the left-side character value.
F) RPAD: For a left-justified value, it returns the padding of the right-side character
value.
G) TRIM: This function removes all defined characters from the beginning, end, or both
ends of a string. It also reduced the amount of wasted space.
H) REPLACE: This function replaces all instances of a word or a section of a string
(substring) with the other string value specified.
Q19. Write the SQL query to get the third maximum salary of an
employee from a table named employees.
Employee table
employee_name salary
A 24000
C 34000
D 55000
E 75000
F 21000
G 40000
H 50000
 

14 | P a g e
SELECT * FROM(
SELECT employee_name, salary, DENSE_RANK() 
OVER(ORDER BY salary DESC)r FROM Employee) 
WHERE r=&n;
To find 3rd highest salary set n = 3
Q20. What is the difference between the RANK() and DENSE_RANK()
functions?
The RANK() function in the result set defines the rank of each row within your ordered
partition. If both rows have the same rank, the next number in the ranking will be the
previous rank plus a number of duplicates. If we have three records at rank 4, for
example, the next level indicated is 7.
The DENSE_RANK() function assigns a distinct rank to each row within a partition
based on the provided column value, with no gaps. It always indicates a ranking in order
of precedence. This function will assign the same rank to the two rows if they have the
same rank, with the next rank being the next consecutive number. If we have three
records at rank 4, for example, the next level indicated is 5.
Q21. What are Tables and Fields?
A table is a collection of data components organized in rows and columns in a relational
database. A table can also be thought of as a useful representation of relationships. The
most basic form of data storage is the table. An example of an Employee table is shown
below.
ID Name Department Salary
1 Rahul Sales 24000
2 Rohini Marketing 34000
3 Shylesh Sales 24000
4 Tarun Analytics 30000
 
A Record or Row is a single entry in a table. In a table, a record represents a collection
of connected data. The Employee table, for example, has four records.
A table is made up of numerous records (rows), each of which can be split down into
smaller units called Fields(columns). ID, Name, Department, and Salary are the four
fields in the Employee table above.
Q22. What is a UNIQUE constraint?
The UNIQUE Constraint prevents identical values in a column from appearing in two
records. The UNIQUE constraint guarantees that every value in a column is unique.
Q23. What is a Self-Join?
A self-join is a type of join that can be used to connect two tables. As a result, it is a
unary relationship. Each row of the table is attached to itself and all other rows of the
same table in a self-join. As a result, a self-join is mostly used to combine and compare
rows from the same database table.
Q24. What is the SELECT statement?
A SELECT command gets zero or more rows from one or more database tables or
views. The most frequent data manipulation language (DML) command is SELECT in
most applications. SELECT queries define a result set, but not how to calculate it,
because SQL is a declarative programming language.
15 | P a g e
Q25. What are some common clauses used with SELECT query in SQL?
The following are some frequent SQL clauses used in conjunction with a SELECT
query:

SQL Essentials Training & Certification


riculum

WHERE clause: In SQL, the WHERE clause is used to filter records that are required
depending on certain criteria.
ORDER BY clause: The ORDER BY clause in SQL is used to sort data in ascending
(ASC) or descending (DESC) order depending on specified field(s) (DESC).
GROUP BY clause: GROUP BY clause in SQL is used to group entries with identical
data and may be used with aggregation methods to obtain summarised database
results.
HAVING clause in SQL is used to filter records in combination with the GROUP BY
clause. It is different from WHERE, since the WHERE clause cannot filter aggregated
records.
Q26. What are UNION, MINUS and INTERSECT commands?
The UNION operator is used to combine the results of two tables while also
removing duplicate entries. 
The MINUS operator is used to return rows from the first query but not from the second
query. 
The INTERSECT operator is used to combine the results of both queries into a single
row.
Before running either of the above SQL statements, certain requirements must be
satisfied –
Within the clause, each SELECT query must have the same amount of columns.
The data types in the columns must also be comparable.
In each SELECT statement, the columns must be in the same order.
Let’s move to the next question in this SQL Interview Questions.
Q27. What is Cursor? How to use a Cursor?
After any variable declaration, DECLARE a cursor. A SELECT Statement must always
be coupled with the cursor definition.
To start the result set, move the cursor over it. Before obtaining rows from the result set,
the OPEN statement must be executed.
To retrieve and go to the next row in the result set, use the FETCH command.
To disable the cursor, use the CLOSE command.

16 | P a g e
Finally, use the DEALLOCATE command to remove the cursor definition and free up
the resources connected with it.
Q28. List the different types of relationships in SQL.
There are different types of relations in the database:
One-to-One – This is a connection between two tables in which each record in one table
corresponds to the maximum of one record in the other.
One-to-Many and Many-to-One – This is the most frequent connection, in which a
record in one table is linked to several records in another.
Many-to-Many – This is used when defining a relationship that requires several
instances on each sides.
Self-Referencing Relationships – When a table has to declare a connection with itself,
this is the method to employ.
Q29. What is SQL example?
SQL is a database query language that allows you to edit, remove, and request data
from databases. The following statements are a few examples of SQL statements:

 SELECT 
 INSERT 
 UPDATE
 DELETE
 CREATE DATABASE
 ALTER DATABASE
Q30. What are basic SQL skills?
SQL skills aid data analysts in the creation, maintenance, and retrieval of data from
relational databases, which divide data into columns and rows. It also enables users to
efficiently retrieve, update, manipulate, insert, and alter data.
The most fundamental abilities that a SQL expert should possess are:

1. Database Management
2. Structuring a Database
3. Creating SQL clauses and statements
4. SQL System SKills like MYSQL, PostgreSQL
5. PHP expertise is useful.
6. Analyze SQL data
7. Using WAMP with SQL to create a database
8. OLAP Skills
Q31. What is schema in SQL Server?
A schema is a visual representation of the database that is logical. It builds
and specifies the relationships among the database’s numerous entities. It
refers to the several kinds of constraints that may be applied to a database. It
also describes the various data kinds. It may also be used on Tables and
Views.
Schemas come in a variety of shapes and sizes. Star schema and Snowflake
schema are two of the most popular. The entities in a star schema are
represented in a star form, whereas those in a snowflake schema are shown

17 | P a g e
in a snowflake shape.
Any database architecture is built on the foundation of schemas.
Q32. How to create a temp table in SQL Server?
Temporary tables are created in TempDB and are erased automatically after the last
connection is closed. We may use Temporary Tables to store and process interim
results. When we need to store temporary data, temporary tables come in handy.
The following is the syntax for creating a Temporary Table:
CREATE TABLE #Employee (id INT, name VARCHAR(25))
INSERT INTO #Employee VALUES (01, ‘Ashish’), (02, ‘Atul’)

Let’s move to the next question in this SQL Interview Questions.

Q33. How to install SQL Server in Windows 11?


Install SQL Server Management Studio In Windows 11
Step 1: Click on SSMS, which will take you to the SQL Server Management Studio
page.
Step 2: Moreover, click on the SQL Server Management Studio link and tap on Save
File. 
Step 3: Save this file to your local drive and go to the folder.
Step 4: The setup window will appear, and here you can choose the location where you
want to save the file.
Step 5: Click on Install.
Step 6: Close the window after the installation is complete.
Step 7: Furthermore, go back to your Start Menu and search for SQL server
management studio.
Step 8: Furthermore, double-click on it, and the login page will appear once it shows up.
Step 9: You should be able to see your server name. However, if that’s not visible, click
on the drop-down arrow on the server and tap on Browse.
Step 10: Choose your SQL server and click on Connect.
After that, the SQL server will connect, and Windows 11 will run good.
Q34. What is the case when in SQL Server?
The CASE statement is used to construct logic in which one column’s value is
determined by the values of other columns.
At least one set of WHEN and THEN commands makes up the SQL Server CASE
Statement. The condition to be tested is specified by the WHEN statement. If the WHEN
condition returns TRUE, the THEN sentence explains what to do.
When none of the WHEN conditions return true, the ELSE statement is executed. The
END keyword brings the CASE statement to a close.
1 CASE
2 WHEN condition1 THEN result1
3 WHEN condition2 THEN result2
4 WHEN conditionN THEN resultN
ELSE result
5 END;
6
Q35. NoSQL vs SQL
In summary, the following are the five major distinctions between SQL and NoSQL:

18 | P a g e
Relational databases are SQL, while non-relational databases are NoSQL.
SQL databases have a specified schema and employ structured query language. For
unstructured data, NoSQL databases use dynamic schemas.
SQL databases scale vertically, but NoSQL databases scale horizontally.
NoSQL databases are document, key-value, graph, or wide-column stores, whereas
SQL databases are table-based.
SQL databases excel in multi-row transactions, while NoSQL excels at unstructured
data such as documents and JSON.
 
Q36. What is the difference between NOW() and CURRENT_DATE()?
NOW() returns a constant time that indicates the time at which the statement began to
execute. (Within a stored function or trigger, NOW() returns the time at which the
function or triggering statement began to execute.
The simple difference between NOW() and CURRENT_DATE() is that NOW() will fetch
the current date and time both in format ‘YYYY-MM_DD HH:MM:SS’ while
CURRENT_DATE() will fetch the date of the current day ‘YYYY-MM_DD’.

Let’s move to the next question in this SQL Interview Questions.


Q37. What is BLOB and TEXT in MySQL?
BLOB stands for Binary Huge Objects and can be used to store binary data, whereas
TEXT may be used to store a large number of strings. BLOB may be used to store
binary data, which includes images, movies, audio, and applications.
BLOB values function similarly to byte strings, and they lack a character set. As a result,
bytes’ numeric values are completely dependent on comparison and sorting.
    TEXT values behave similarly to a character string or a non-binary string. The
comparison/sorting of TEXT is completely dependent on the character set collection.
Q38. How to remove duplicate rows in SQL?
If the SQL table has duplicate rows, the duplicate rows must be removed.
Let’s assume the following table as our dataset:
 
ID Name Age
1 A 21
2 B 23
2 B 23
4 D 22
5 E 25
6 G 26
5 E 25
The following SQL query removes the duplicate ids from the  table:

DELETE FROM table WHERE ID IN (


SELECT 
ID, COUNT(ID) 
FROM   table
GROUP BY  ID
19 | P a g e
HAVING 
COUNT (ID) > 1); 
Q39. How to create a stored procedure using SQL Server?
A stored procedure is a piece of prepared SQL code that you can save and reuse again
and over.
So, if you have a SQL query that you create frequently, save it as a stored procedure
and then call it to run it.
You may also supply parameters to a stored procedure so that it can act based on the
value(s) of the parameter(s) given.
Stored Procedure Syntax
CREATE PROCEDURE procedure_name
AS
sql_statement
GO;
Execute a Stored Procedure
EXEC procedure_name;

Q40. What is Database Black Box Testing?


Black Box Testing is a software testing approach that involves testing the functions of
software applications without knowing the internal code structure, implementation
details, or internal routes. Black Box Testing is a type of software testing that focuses on
the input and output of software applications and is totally driven by software
requirements and specifications. Behavioral testing is another name for it.

Q41. What are the different types of SQL sandbox?


Databases Training

SQL ESSENTIALS TRAINING & CERTIFICATION


SQL Essentials Training & Certification
Reviews
 5(10265)

MYSQL DBA CERTIFICATION TRAINING


MySQL DBA Certification Training
Reviews
 5(6235)

MONGODB CERTIFICATION TRAINING COURSE


MongoDB Certification Training Course
Reviews
 4(16146)

APACHE CASSANDRA CERTIFICATION TRAINING

20 | P a g e
Apache Cassandra Certification Training
Reviews
 5(12675)

TRADATA CERTIFICATION TRAINING


Teradata Certification Training
Reviews
 5(2831)

MASTERING NEO4J GRAPH DATABASE CERTIFICATION TRAINING


Mastering Neo4j Graph Database Certification Training
Reviews
 5(992)

Next
SQL Sandbox is a secure environment within SQL Server where untrusted programmes
can be run. There are three different types of SQL sandboxes:
Safe Access Sandbox: In this environment, a user may execute SQL activities like as
building stored procedures, triggers, and so on, but they can’t access the memory or
create files.
Sandbox for External Access: Users can access files without having the ability to alter
memory allocation.
Unsafe Access Sandbox: This contains untrustworthy code that allows a user to access
memory.
Let’s move to the next question in this SQL Interview Questions.
Q42. Where MyISAM table is stored?
Prior to the introduction of MySQL 5.5 in December 2009, MyISAM was the default
storage engine for MySQL relational database management system versions.  It’s
based on the older ISAM code, but it comes with a lot of extra features. Each MyISAM
table is split into three files on disc (if it is not partitioned). The file names start with the
table name and end with an extension that indicates the file type. The table definition is
stored in a.frm file, however this file is not part of the MyISAM engine; instead, it is part
of the server. The data file’s suffix is.MYD (MYData). The index file’s extension is.MYI
(MYIndex). If you lose your index file, you may always restore it by recreating indexes.

Q43. How to find the nth highest salary in SQL?


The most typical interview question is to find the Nth highest pay in a table. This work
can be accomplished using the dense rank() function.
Employee table
employee_name salary
A 24000
C 34000
D 55000
E 75000
F 21000
G 40000

21 | P a g e
H 50000
 
SELECT * FROM(
SELECT employee_name, salary, DENSE_RANK() 
OVER(ORDER BY salary DESC)r FROM Employee) 
WHERE r=&n;
To find to the 2nd highest salary set n = 2
To find 3rd highest salary set n = 3 and so on.
Q44. What do you mean by table and field in SQL?
A table refers to a collection of data in an organised manner in form of rows and
columns. A field refers to the number of columns in a table. For example:
Table: StudentInformation
Field: Stu Id, Stu Name, Stu Marks
Q45. What are joins in SQL?
A JOIN clause is used to combine rows from two or more tables, based on a related
column between them. It is used to merge two tables or retrieve data from there. There
are 4 types of joins, as you can refer to below:

 Inner join: Inner Join in SQL is the most common type of join. It is used to return
all the rows from multiple tables where the join condition is satisfied. 
 Left Join:  Left Join in SQL is used to return all the rows from the left table but
only the matching rows from the right table where the join condition is fulfilled.
 Right Join: Right Join in SQL is used to return all the rows from the right table
but only the matching rows from the left table where the join condition is fulfilled.
 Full Join: Full join returns all the records when there is a match in any of the
tables. Therefore, it returns all the rows from the left-hand side table and all the
rows from the right-hand side table.
Let’s move to the next question in this SQL Interview Questions.

22 | P a g e
Q46. What is the difference between CHAR and VARCHAR2 datatype in
SQL?

Both Char and Varchar2 are used for characters datatype but varchar2 is used for
character strings of variable length whereas Char is used for strings of fixed length. For
example, char(10) can only store 10 characters and will not be able to store a string of
any other length whereas varchar2(10) can store any length i.e 6,8,2 in this variable.
Q47. What is a Primary key?

 A Primary key in SQL is a column (or collection of


columns) or a set of columns that uniquely identifies each row in the table.
 Uniquely identifies a single row in the table
 Null values not allowed
Example- In the Student table, Stu_ID is the primary key.
Q48. What are Constraints?
Constraints in SQL are used to specify the limit on the data type of the table. It can be
specified while creating or altering the table statement. The sample of constraints are:

 NOT NULL
 CHECK
 DEFAULT
 UNIQUE
 PRIMARY KEY
 FOREIGN KEY
Q49. What is the difference between DELETE and TRUNCATE
statements?
DELETE vs TRUNCATE

DELETE TRUNCATE

Delete command is used to delete a row in Truncate is used to delete all the rows
a table. from a table.
You can rollback data after using delete
You cannot rollback data.
statement.
It is a DML command. It is a DDL command.
It is slower than truncate statement. It is faster.

23 | P a g e
Q50. What is a Unique key?

 Uniquely identifies a single row in the table.


 Multiple values allowed per table.
 Null values allowed.
Apart from this SQL Interview Questions blog, if you want to get trained from
professionals on this technology, you can opt for structured training from edureka! 
Q51. What is a Foreign key in SQL?

 Foreign key maintains referential integrity by enforcing a link between the data in
two tables.
 The foreign key in the child table references the primary key in the parent table.
 The foreign key constraint prevents actions that would destroy links between the
child and parent tables.
Q52. What do you mean by data integrity? 
Data Integrity defines the accuracy as well as the consistency of the data stored in a
database. It also defines integrity constraints to enforce business rules on the data
when it is entered into an application or a database.
Q53. What is the difference between clustered and non-clustered index
in SQL?
The differences between the clustered and non clustered index in SQL are :

1. Clustered index is used for easy retrieval of data from the database and its faster
whereas reading from non clustered index is relatively slower.
2. Clustered index alters the way records are stored in a database as it sorts out
rows by the column which is set to be clustered index whereas in a non clustered
index, it does not alter the way it was stored but it creates a separate object
within a table which points back to the original table rows after searching.
3. One table can only have one clustered index whereas it can have many non
clustered index.
Q54. Write a SQL query to display the current date?
In SQL, there is a built-in function called GetDate() which helps to return the current
timestamp/date.
Q55.What do you understand by query optimization?
The phase that identifies a plan for evaluation query which has the least estimated cost
is known as query optimization.
The advantages of query optimization are as follows:

 The output is provided faster


 A larger number of queries can be executed in less time
 Reduces time and space complexity
Q56. What do you mean by Denormalization?
Denormalization refers to a technique which is used to access data from higher to lower
forms of a database. It helps the database managers to increase the performance of the
entire infrastructure as it introduces redundancy into a table. It adds the redundant data

24 | P a g e
into a table by incorporating database queries that combine data from various tables
into a single table.
Q57. What are Entities and Relationships?
Entities:  A person, place, or thing in the real world about which data can be stored in a
database. Tables store data that represents one type of entity. For example – A bank
database has a customer table to store customer information. The customer table
stores this information as a set of attributes (columns within the table) for each
customer.
Relationships: Relation or links between entities that have something to do with each
other. For example – The customer name is related to the customer account number
and contact information, which might be in the same table. There can also be
relationships between separate tables (for example, customer to accounts).
Let’s move to the next question in this SQL Interview Questions.

Q58. What is an Index?


An index refers to a performance tuning method of allowing faster retrieval of records
from the table. An index creates an entry for each value and hence it will be faster to
retrieve data.
Q59. Explain different types of index in SQL.
There are three types of index in SQL namely:
Unique Index:
This index does not allow the field to have duplicate values if the column is unique
indexed. If a primary key is defined, a unique index can be applied automatically.
Clustered Index:
This index reorders the physical order of the table and searches based on the basis of
key values. Each table can only have one clustered index.
Non-Clustered Index:
Non-Clustered Index does not alter the physical order of the table and maintains a
logical order of the data. Each table can have many nonclustered indexes.
Q60. What is Normalization and what are the advantages of it?
Normalization in SQL is the process of organizing data to avoid duplication and
redundancy. Some of the advantages are:

 Better Database organization


 More Tables with smaller rows
 Efficient data access
 Greater Flexibility for Queries
 Quickly find the information
 Easier to implement Security
 Allows easy modification
 Reduction of redundant and duplicate data
 More Compact Database
 Ensure Consistent data after modification

25 | P a g e
Apart from this SQL Interview Questions Blog, if you want to get trained from
professionals on this technology, you can opt for structured training from edureka! 
Q61. What is the difference between DROP and TRUNCATE commands?
DROP command removes a table and it cannot be rolled back from the database
whereas TRUNCATE command removes all the rows from the table.
Q62. Explain different types of Normalization.
There are many successive levels of normalization. These are called normal
forms. Each consecutive normal form depends on the previous one.The first three
normal forms are usually adequate.
Normal Forms are used in database tables to remove or decrease duplication. The
following are the many forms:
First Normal Form:
When every attribute in a relation is a single-valued attribute, it is said to be in first
normal form. The first normal form is broken when a relation has a composite or multi-
valued property.
Second Normal Form:
A relation is in second normal form if it meets the first normal form’s requirements and
does not contain any partial dependencies. In 2NF, a relation has no partial
dependence, which means it has no non-prime attribute that is dependent on any
suitable subset of any table candidate key. Often, the problem may be solved by setting
a single column Primary Key.
Third Normal Form:
If a relation meets the requirements for the second normal form and there is no
transitive dependency, it is said to be in the third normal form.
Q63. What is OLTP?
OLTP, or online transactional processing, allows huge groups of people to execute
massive amounts of database transactions in real time, usually via the internet. A
database transaction occurs when data in a database is changed, inserted, deleted, or
queried.
What are the differences between OLTP and OLAP?
OLTP stands for online transaction processing, whereas OLAP stands for online
analytical processing. OLTP is an online database modification system, whereas OLAP
is an online database query response system.
Q64. How to create empty tables with the same structure as another table?
To create empty tables:
Using the INTO operator to fetch the records of one table into a new table while setting
a WHERE clause to false for all entries, it is possible to create empty tables with the
same structure. As a result, SQL creates a new table with a duplicate structure to
accept the fetched entries, but nothing is stored into the new table since the WHERE
clause is active.

Q65. What is PostgreSQL?


In 1986, a team lead by Computer Science Professor Michael Stonebraker created
PostgreSQL under the name Postgres. It was created to aid developers in the
development of enterprise-level applications by ensuring data integrity and fault
tolerance in systems. PostgreSQL is an enterprise-level, versatile, resilient, open-

26 | P a g e
source, object-relational database management system that supports variable
workloads and concurrent users. The international developer community has constantly
backed it. PostgreSQL has achieved significant appeal among developers because to
its fault-tolerant characteristics.
It’s a very reliable database management system, with more than two decades of
community work to thank for its high levels of resiliency, integrity, and accuracy. Many
online, mobile, geospatial, and analytics applications utilise PostgreSQL as their primary
data storage or data warehouse.
Q66. What are SQL comments?
SQL Comments are used to clarify portions of SQL statements and to prevent SQL
statements from being executed. Comments are quite important in many programming
languages. The comments are not supported by a Microsoft Access database. As a
result, the Microsoft Access database is used in the examples in Mozilla Firefox and
Microsoft Edge.
Single Line Comments: It starts with two consecutive hyphens (–).
Multi-line Comments: It starts with /* and ends with */.
Let’s move to the next question in this SQL Interview Questions.
Q67. What is the difference between the RANK() and DENSE_RANK() functions?
The RANK() function in the result set defines the rank of each row within your ordered
partition. If both rows have the same rank, the next number in the ranking will be the
previous rank plus a number of duplicates. If we have three records at rank 4, for
example, the next level indicated is 7.

SQL Essentials Training & Certification

The DENSE_RANK() function assigns a distinct rank to each row within a partition
based on the provided column value, with no gaps. It always indicates a ranking in order
of precedence. This function will assign the same rank to the two rows if they have the
same rank, with the next rank being the next consecutive number. If we have three
records at rank 4, for example, the next level indicated is 5.
Q68. What is SQL Injection?
SQL injection is a sort of flaw in website and web app code that allows attackers to take
control of back-end processes and access, retrieve, and delete sensitive data stored in
databases. In this approach, malicious SQL statements are entered into a database
entry field, and the database becomes exposed to an attacker once they are executed.
By utilising data-driven apps, this strategy is widely utilised to get access to sensitive
data and execute administrative tasks on databases. SQLi attack is another name for it.
The following are some examples of SQL injection:

27 | P a g e
 Getting access to secret data in order to change a SQL query to acquire the
desired results.
 UNION attacks are designed to steal data from several database tables.
 Examine the database to get information about the database’s version and
structure
Q69. How many Aggregate functions are available in SQL?
SQL aggregate functions provide information about a database’s data. AVG, for
example, returns the average of a database column’s values.
SQL provides seven (7) aggregate functions, which are given below:
AVG(): returns the average value from specified columns.
COUNT(): returns the number of table rows, including rows with null values.
MAX(): returns the largest value among the group.
MIN(): returns the smallest value among the group.
SUM(): returns the total summed values(non-null) of the specified column.
FIRST(): returns the first value of an expression.
LAST(): returns the last value of an expression.
Q70. What is the default ordering of data using the ORDER BY clause? How could
it be changed?
The ORDER BY clause in MySQL can be used without the ASC or DESC modifiers.
The sort order is preset to ASC or ascending order when this attribute is absent from
the ORDER BY clause.
Q71. How do we use the DISTINCT statement? What is its use?
The SQL DISTINCT keyword is combined with the SELECT query to remove all
duplicate records and return only unique records. There may be times when a table has
several duplicate records.
The DISTINCT clause in SQL is used to eliminate duplicates from a SELECT
statement’s result set.
Q72. What are the syntax and use of the COALESCE function?
From a succession of expressions, the COALESCE function returns the first non-NULL
value. The expressions are evaluated in the order that they are supplied, and the
function’s result is the first non-null value. Only if all of the inputs are null does the
COALESCE method return NULL.
The syntax of COALESCE function is COALESCE (exp1, exp2, …. expn) 
 
Q73. What is the ACID property in a database?
ACID stands for Atomicity, Consistency, Isolation, Durability. It is used to ensure that
the data transactions are processed reliably in a database system. 

 Atomicity: Atomicity refers to the transactions that are completely done or failed


where transaction refers to a single logical operation of a data. It means if one
part of any transaction fails, the entire transaction fails and the database state is
left unchanged.
 Consistency: Consistency ensures that the data must meet all the validation
rules. In simple words,  you can say that your transaction never leaves the
database without completing its state.
 Isolation: The main goal of isolation is concurrency control.

28 | P a g e
 Durability: Durability means that if a transaction has been committed, it will
occur whatever may come in between such as power loss, crash or any sort of
error.
Want to upskill yourself to get ahead in your career? Check out this video in this SQL
Interview Questions
 
Top 10 Technologies to Learn in 2022 | Edureka
This Edureka video on 𝐓𝐨𝐩 𝟏𝟎 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐢𝐞𝐬 𝐭𝐨 𝐋𝐞𝐚𝐫𝐧 𝐢𝐧 𝟐𝟎𝟐𝟐 will introduce you to all
the popular and trending technologies in the market which you should focus on in 2022.
These are the trending technologies that you need to learn in order to have a successful
career in the year 2022..

 
Q74. What do you mean by “Trigger” in SQL?
Trigger in SQL is are a special type of stored procedures that are defined to execute
automatically in place or after data modifications. It allows you to execute a batch of
code when an insert, update or any other query is executed against a specific table.
Q75. What are the different operators available in SQL?
There are three operators available in SQL, namely:

1. Arithmetic Operators
2. Logical Operators
3. Comparison Operators
Apart from this SQL Interview Questions blog, if you want to get trained from
professionals on this technology, you can opt for structured training from edureka! 
Q76.  Are NULL values same as that of zero or a blank space? 
A NULL value is not at all same as that of zero or a blank space. NULL value represents
a value which is unavailable, unknown, assigned or not applicable whereas a zero is a
number and blank space is a character.
Q77. What is the difference between cross join and natural join?
The cross join produces the cross product or Cartesian product of two tables whereas
the natural join is based on all the columns having the same name and data types in
both the tables.
Q78. What is subquery in SQL?
A subquery is a query inside another query where a query is defined to retrieve data or
information back from the database. In a subquery, the outer query is called as the main
query whereas the inner query is called subquery. Subqueries are always executed first
and the result of the subquery is passed on to the main query. It can be nested inside a
SELECT, UPDATE or any other query. A subquery can also use any comparison
operators such as >,< or =.
Q79. What are the different types of a subquery?
There are two types of subquery namely, Correlated and Non-Correlated.

29 | P a g e
Correlated subquery: These are queries which select the data from a table referenced
in the outer query. It is not considered as an independent query as it refers to another
table and refers the column in a table.
Non-Correlated subquery: This query is an independent query where the output of
subquery is substituted in the main query.
Let’s move to the next question in this SQL Interview Questions.
Q80. List the ways to get the count of records in a table?
To count the number of records in a table in SQL, you can use the below commands:
SELECT * FROM table1

SELECT COUNT(*) FROM table1

SELECT rows FROM sysindexes WHERE id = OBJECT_ID(table1) AND indid < 2

Apart from this SQL Interview Questions Blog, if you want to get trained from
professionals on this technology, you can opt for structured training from edureka! 
Q81. Write a SQL query to find the names of employees that begin with ‘A’?
To display name of the employees that begin with ‘A’, type in the below command:
1 SELECT * FROM Table_name WHERE EmpName like 'A%'
Q82. Write a SQL query to get the third-highest salary of an employee from
employee_table?
1 SELECT TOP 1 salary
2 FROM(
3 SELECT TOP 3 salary
4 FROM employee_table
ORDER BY salary DESC) AS emp
5 ORDER BY salary ASC;
6
Q83. What is the need for group functions in SQL? 
Group functions work on the set of rows and return one result per group. Some of the
commonly used group functions are: AVG, COUNT, MAX, MIN, SUM, VARIANCE.
Q84. What is a Relationship and what are they?
Relation or links are between entities that have something to do with each other.
Relationships are defined as the connection between the tables in a database. There
are various relationships, namely:

 One to One Relationship.


 One to Many Relationship.
 Many to One Relationship.
 Self-Referencing Relationship.
Q85.  How can you insert NULL values in a column while inserting the data?
NULL values in SQL can be inserted in the following ways:

 Implicitly by omitting column from column list.


 Explicitly by specifying NULL keyword in the VALUES clause

30 | P a g e
Q86. What is the main difference between ‘BETWEEN’ and ‘IN’ condition
operators?
BETWEEN operator is used to display rows based on a range of values in a row
whereas the IN condition operator is used to check for values contained in a specific set
of values.
 Example of BETWEEN:
SELECT * FROM Students where ROLL_NO BETWEEN 10 AND 50;
Example of IN:
SELECT * FROM students where ROLL_NO IN (8,15,25);

Q87. Why are SQL functions used?


SQL functions are used for the following purposes:

 To perform some calculations on the data


 To modify individual data items
 To manipulate the output
 To format dates and numbers
 To convert the data types
Q88. What is the need for MERGE statement?
This statement allows conditional update or insertion of data into a table. It performs an
UPDATE if a row exists, or an INSERT if the row does not exist.
Q89. What do you mean by recursive stored procedure?
Recursive stored procedure refers to a stored procedure which calls by itself until it
reaches some boundary condition. This recursive function or procedure helps the
programmers to use the same set of code n number of times.
Q90. What is CLAUSE in SQL?
SQL clause helps to limit the result set by providing a condition to the query. A clause
helps to filter the rows from the entire set of records.
For example – WHERE, HAVING clause.
Apart from this SQL Interview Questions Blog, if you want to get trained from
professionals on this technology, you can opt for a structured training from edureka!
Click below to know more.
Q91. What is the difference between ‘HAVING’ CLAUSE and a ‘WHERE’
CLAUSE?
HAVING clause can be used only with SELECT statement. It is usually used in a
GROUP BY clause and whenever GROUP BY is not used, HAVING behaves like a
WHERE clause.
Having Clause is only used with the GROUP BY function in a query whereas WHERE
Clause is applied to each row before they are a part of the GROUP BY function in a
query.
Q92. List the ways in which  Dynamic SQL can be executed?
Following are the ways in which dynamic SQL can be executed:

 Write a query with parameters.


 Using EXEC.

31 | P a g e
 Using sp_executesql.
Q93. What are the various levels of constraints?
Constraints are the representation of a column to enforce data entity and consistency.
There are two levels  of a constraint, namely:

 column level constraint


 table level constraint

Q94. How can you fetch common records from two tables?
You can fetch common records from two tables using INTERSECT. For example:

Select studentID from student. <strong>INTERSECT </strong> Select StudentID from


Exam
Q95. List some case manipulation functions in SQL?
There are three case manipulation functions in SQL, namely:

LOWER: This function returns the string in lowercase. It takes a string as an


argument and returns it by converting it into lower case. Syntax:
LOWER(‘string’)

UPPER: This function returns the string in uppercase. It takes a string as an


argument and returns it by converting it into uppercase. Syntax:
UPPER(‘string’)

INITCAP: This function returns the string with the first letter in uppercase and rest
of the letters in lowercase. Syntax:
INITCAP(‘string’)

Apart from this SQL Interview Questions blog, if you want to get trained from
professionals on this technology, you can opt for a structured training from edureka!
Click below to know more.
 Q96. What are the different set operators available in SQL?
Some of the available set operators are – Union, Intersect or Minus operators.
Q97. What is an ALIAS command?
ALIAS command in SQL is the name that can be given to any table or a column. This
alias name can be referred in WHERE clause to identify a particular table or a column.
For example-
Select emp.empID, dept.Result from employee emp, department as dept
where emp.empID=dept.empID
In the above example, emp refers to alias name for employee table and dept refers to
alias name for department table.
Let’s move to the next question in this SQL Interview Questions.
Q98. What are aggregate and scalar functions?
Aggregate functions are used to evaluate mathematical calculation and returns a single
value. These calculations are done from the columns in a table. For example-
max(),count() are calculated with respect to numeric.
32 | P a g e
Scalar functions return a single value based on the input value. For example –
UCASE(), NOW() are calculated with respect to string.
Let’s move to the next question in this SQL Interview Questions.
Q99. How can you fetch alternate records from a table?
You can fetch alternate records i.e both odd and even row numbers. For example- To
display even numbers, use the following command:
Select studentId from (Select rowno, studentId from student) where mod(rowno,2)=0
Now, to display odd numbers:
Select studentId from (Select rowno, studentId from student) where
mod(rowno,2)=1
Q100. Name the operator which is used in the query for pattern matching?
LIKE operator is used for pattern matching, and it can be used as -.

1. % – It matches zero or more characters.


For example- select * from students where studentname like ‘a%’
_ (Underscore) – it matches exactly one character.
For example- select * from student where studentname like ‘abc_’
Apart from this SQL Interview Questions Blog, if you want to get trained from
professionals on this technology, you can opt for structured training from edureka! 
Q101. How can you select unique records from a table?
You can select unique records from a table by using the DISTINCT keyword.
Select DISTINCT studentID from Student
Using this command, it will print unique student id from the table Student.
Q102. How can you fetch first 5 characters of the string?
There are a lot of ways to fetch characters from a string. For example:
Select SUBSTRING(StudentName,1,5) as studentname from student
Q103. What is the main difference between SQL and PL/SQL?
SQL is a query language that allows you to issue a single query or execute a single
insert/update/delete whereas PL/SQL is Oracle’s “Procedural Language” SQL, which
allows you to write a full program (loops, variables, etc.) to accomplish multiple
operations such as selects/inserts/updates/deletes. 
Q104. What is a View?
A view is a virtual table which consists of a subset of data contained in a table. Since
views are not present, it takes less space to store. View can have data of one or more
tables combined and it depends on the relationship.
Let’s move to the next question in this SQL Interview Questions.
Q105. What are Views used for?
A view refers to a logical snapshot based on a table or another view. It is used for the
following reasons:

 Restricting access to data.


 Making complex queries simple.
 Ensuring data independence.
 Providing different views of same data.
Q106. What is a Stored Procedure?

33 | P a g e
A Stored Procedure is a function which consists of many SQL statements to access the
database system. Several SQL statements are consolidated into a stored procedure
and execute them whenever and wherever required which saves time and avoid writing
code again and again.
Q107. List some advantages and disadvantages of Stored Procedure?
Advantages:
A Stored Procedure can be used as a modular programming which means create once,
store and call for several times whenever it is required. This supports faster execution. It
also reduces network traffic and provides better security to the data.
Disadvantage:
The only disadvantage of Stored Procedure is that it can be executed only in the
database and utilizes more memory in the database server.
Q108. List all the types of user-defined functions?
There are three types of user-defined functions, namely:

 Scalar Functions
 Inline Table-valued functions
 Multi-statement valued functions
Scalar returns the unit, variant defined the return clause. Other two types of defined
functions return table.
Let’s move to the next question in this SQL Interview Questions.
Q109. What do you mean by Collation?
Collation is defined as a set of rules that determine how data can be sorted as well as
compared. Character data is sorted using the rules that define the correct character
sequence along with options for specifying case-sensitivity, character width etc.
Let’s move to the next question in this SQL Interview Questions.
Q110. What are the different types of Collation Sensitivity?
Following are the different types of collation sensitivity:

 Case Sensitivity: A and a and B and b.


 Kana Sensitivity: Japanese Kana characters.
 Width Sensitivity: Single byte character and double-byte character.
 Accent Sensitivity.
Apart from this SQL Interview Questions Blog, if you want to get trained from
professionals on this technology, you can opt for structured training from edureka! 
Q111. What are Local and Global variables?
Local variables:
These variables can be used or exist only inside the function. These variables are not
used or referred by any other function.
Global variables:
These variables are the variables which can be accessed throughout the program.
Global variables cannot be created whenever that function is called.
Q112. What is Auto Increment in SQL?

34 | P a g e
Autoincrement keyword allows the user to create a unique number to get generated
whenever a new record is inserted into the table.
This keyword is usually required whenever PRIMARY KEY in SQL is used.
AUTO INCREMENT keyword can be used in Oracle and IDENTITY keyword can be
used in SQL SERVER.
Q113. What is a Datawarehouse?
Datawarehouse refers to a central repository of data where the data is assembled from
multiple sources of information. Those data are consolidated, transformed and made
available for the mining as well as online processing. Warehouse data also have a
subset of data called Data Marts.
Q114. What are the different authentication modes in SQL Server? How can it
be changed?
Windows mode and Mixed Mode – SQL and Windows. You can go to the below steps to
change authentication mode in SQL Server:

 Click Start> Programs> Microsoft SQL Server and click SQL Enterprise Manager
to run SQL Enterprise Manager from the Microsoft SQL Server program group.
 Then select the server from the Tools menu.
 Select SQL Server Configuration Properties, and choose the Security page.
Q115. What are STUFF and REPLACE function?
STUFF Function: This function is used to overwrite existing character or inserts a string
into another string. Syntax:
STUFF(string_expression,start, length, replacement_characters)
where,
string_expression: it is the string that will have characters substituted
start: This refers to the starting position
length: It refers to the number of characters in the string which are substituted.
replacement_string: They are the new characters which are injected in the string.

35 | P a g e
Data Analyst Interview Questions: Basic
This section of questions will consist of all the basic questions that you need to know
related to Data Analytics and its terminologies.

Q1. What is the difference between Data Mining and Data Analysis?

Data Mining Data Analysis


Used to recognize patterns in data Used to order & organize raw data in a
stored. meaningful manner.
The analysis of data involves Data
Mining is performed on clean and
Cleaning.  So, data is not present in a
well-documented data.
well-documented format.
Results extracted from data mining Results extracted from data analysis are
are not easy to interpret. easy to interpret.
Table 1: Data Mining vs Data Analysis – Data Analyst Interview Questions

So, if you have to summarize, Data Mining is often used to identify patterns in the data
stored. It is mostly used for Machine Learning, and analysts have to just recognize the
patterns with the help of algorithms. Whereas, Data Analysis is used to gather insights
from raw data, which has to be cleaned and organized before performing the analysis.

Q2. What is the process of Data Analysis?

36 | P a g e
Data analysis is the process of collecting, cleansing, interpreting, transforming and
modeling data to gather insights and generate reports to gain business profits. Refer to
the image below to know the various steps involved in the process.

F
ig 1: Process of  Data Analysis – Data Analyst Interview Questions

 Collect Data: The data gets collected from various sources and is stored so that
it can be cleaned and prepared. In this step, all the missing values and outliers
are removed.
 Analyse Data: Once the data is ready, the next step is to analyze the data. A
model is run repeatedly for improvements. Then, the mode is validated to check
whether it meets the business requirements.
 Create Reports: Finally, the model is implemented and then reports thus
generated are passed onto the stakeholders.

Q3. What is the difference between Data Mining and Data Profiling?

 Data Mining: Data Mining refers to the analysis of data with respect to finding relations
that have not been discovered earlier. It mainly focuses on the detection of unusual
records, dependencies and cluster analysis.

Data Profiling: Data Profiling refers to the process of analyzing individual attributes of


data. It mainly focuses on providing valuable information on data attributes such as data
type, frequency etc.

Q4. What is data cleansing and what are the best ways to practice data
cleansing?

Data Cleansing or Wrangling or Data Cleaning. All mean the same thing. It is the
process of identifying and removing errors to enhance the quality of data. You can refer
to the below image to know the various ways to deal with missing data.
37 | P a g e
Fig 2: Ways of Data Cleansing – Data Analyst Interview Questions

Q5. What are the important steps in the data validation process?

As the name suggests Data Validation is the process of validating data. This step
mainly has two processes involved in it. These are Data Screening and Data
Verification.

 Data Screening: Different kinds of algorithms are used in this step to screen the
entire data to find out any inaccurate values.
 Data Verification: Each and every suspected value is evaluated on various use-
cases, and then a final decision is taken on whether the value has to be included
in the data or not.

Q6. What do you think are the criteria to say whether a developed data
model is good or not?

Well, the answer to this question may vary from person to person. But below are a few
criteria which I think are a must to be considered to decide whether a developed data
model is good or not:

 A model developed for the dataset should have predictable performance. This is
required to predict the future.
 A model is said to be a good model if it can easily adapt to changes according to
business requirements.
 If the data gets changed, the model should be able to scale according to the
data.

38 | P a g e
 The model developed should also be able to easily consumed by the clients for
actionable and profitable results.

Q7.  When do you think you should retrain a model? Is it dependent on


the data?

Business data keeps changing on a day-to-day basis, but the format doesn’t change. As
and when a business operation enters a new market, sees a sudden rise of opposition
or sees its own position rising or falling, it is recommended to retrain the model. So, as
and when the business dynamics change, it is recommended to retrain the model with
the changing behaviors of customers.

Q8. Can you mention a few problems that data analyst usually encounter
while performing the analysis?

The following are a few problems that are usually encountered while performing data
analysis.

 Presence of Duplicate entries and spelling mistakes, reduce data quality.


 If you are extracting data from a poor source, then this could be a problem as
you would have to spend a lot of time cleaning the data.
 When you extract data from sources, the data may vary in representation. Now,
when you combine data from these sources, it may happen that the variation in
representation could result in a delay.
 Lastly, if there is incomplete data, then that could be a problem to perform
analysis of data.

 Q9. What is the KNN imputation method?

This method is used to impute the missing attribute values which are imputed by the
attribute values that are most similar to the attribute whose values are missing. The
similarity of the two attributes is determined by using the distance functions.

Q10. Mention the name of the framework developed by Apache for


processing large dataset for an application in a distributed computing
environment?

The complete Hadoop Ecosystem was developed for processing large dataset for an
application in a distributed computing environment. The Hadoop Ecosystem consists of
the following Hadoop components.

 HDFS -> Hadoop Distributed File System


 YARN -> Yet Another Resource Negotiator
 MapReduce -> Data processing using programming
 Spark -> In-memory Data Processing

39 | P a g e
 PIG, HIVE-> Data Processing Services using Query (SQL-like)
 HBase -> NoSQL Database
 Mahout, Spark MLlib -> Machine Learning
 Apache Drill -> SQL on Hadoop
 Zookeeper -> Managing Cluster
 Oozie -> Job Scheduling
 Flume, Sqoop -> Data Ingesting Services
 Solr & Lucene -> Searching & Indexing 
 Ambari -> Provision, Monitor and Maintain cluster

Now, moving on to the next set of questions, which is the Excel Interview Questions.

Data Analyst Interview Questions: Excel


Microsoft Excel is one of the simplest and most powerful software applications available
out there. It lets users do quantitative analysis, statistical analysis with an intuitive
interface for data manipulation, so much so that its usage spans across different
domains and professional requirements. This is an important field that gives a head-
start for becoming a Data Analyst. So, now let us quickly discuss the questions asked
with respect to this topic.

Q1. Can you tell what is a waterfall chart and when do we use it?

The waterfall chart shows both positive and negative values which lead to the final
result value. For example, if you are analyzing a company’s net income, then you can
have all the cost values in this chart. With such kind of a chart, you can visually, see
how the value from revenue to the net income is obtained when all the costs are
deducted.

Q2. How can you highlight cells with negative values in Excel?

You can highlight cells with negative values in Excel by using the conditional formatting.
Below are the steps that you can follow:

 Select the cells which you want to highlight with the negative values.
 Go to the Home tab and click on the Conditional Formatting option
 Go to the Highlight Cell Rules and click on the Less Than option.
 In the dialog box of Less Than, specify the value as 0.

40 | P a g e
Fig 3: Snapshot of Highlighting cells in Excel – Data Analyst Interview Questions

Q3. How can you clear all the formatting without actually removing the cell
contents?

Sometimes you may want to remove all the formatting and just want to have the
basic/simple data. To do this, you can use the ‘Clear Formats’ options found in the
Home Tab. You can evidently see the option when you click on the ‘Clear’ drop down.

41 | P a g e
Fig 4: Snapshot of clearing all formatting in Excel – Data Analyst Interview Questions

Q4. What is a Pivot Table, and what are the different sections of a Pivot Table?

A Pivot Table is a simple feature in Microsoft Excel which allows you to quickly
summarize huge datasets. It is really easy to use as it requires dragging and dropping
rows/columns headers to create reports.

A Pivot table is made up of four different sections:

 Values Area: Values are reported in this area


 Rows Area: The headings which are present on the left of the values.
 Column Area: The headings at the top of the values area makes the columns
area.
 Filter Area: This is an optional filter used to drill down in the data set.

Q5. Can you make a Pivot Table from multiple tables?

Yes, we can create one Pivot Table from multiple different tables when there is a
connection between these tables.

Q6. How can we select all blank cells in Excel?

If you wish to select all the blank cells in Excel, then you can use the Go To Special
Dialog Box in Excel. Below are the steps that you can follow to select all the blank cells
in Excel.

 First, select the entire dataset and press F5. This will open a Go To Dialog Box.
 Click the ‘Special‘ button which will open a Go To special Dialog box.
 After that, select the Blanks and click on OK.

The final step will select all the blank cells in your dataset.

42 | P a g e
Q7. What are the most common questions you should ask a client before creating
a dashboard?

Well, the answer to this question varies on a case-to-case basis. But, here are a few
common questions that you can ask while creating a dashboard in Excel.

 Purpose of the Dashboards


 Different data sources
 Usage of the Excel Dashboard
 The frequency at which the dashboard needs to be updated
 The version of Office the client uses.

Q8. What is a Print Area and how can you set it in Excel?

A Print Area in Excel is a range of cells that you designate to print whenever you print
that worksheet. For example, if you just want to print the first 20 rows from the entire
worksheet, then you can set the first 20 rows as the Print Area.

Now, to set the Print Area in Excel, you can follow the below steps:

 Select the cells for which you want to set the Print Area.
 Then, click on the Page Layout Tab.
 Click on Print Area.
 Click on Set Print Area.

Q9. What steps can you take to handle slow Excel workbooks?

Well, there are various ways to handle slow Excel workbooks. But, here are a few ways
in which you can handle workbooks.

 Try using manual calculation mode.


 Maintain all the referenced data in a single sheet.
 Often use excel tables and named ranges.
 Use Helper columns instead of array formulas.
 Try to avoid using entire rows or columns in references.
 Convert all the unused formulas to values.

Q10. Can you sort multiple columns at one time?

Multiple sorting refers to the sorting of a column and then sorting the other column by
keeping the first column intact. In Excel, you can definitely sort multiple columns at a
one time.

To do multiple sorting, you need to use the Sort Dialog Box. Now, to get this, you can
select the data that you want to sort and then click on the Data Tab. After that, click on
the Sort icon.

43 | P a g e
In this Dialog box, you can specify the details for one column, and then sort to another
column, by clicking on the Add Level button.

Moving onto the next set of questions, which is questions asked related to Statistics.

Data Analyst Interview Questions: Statistics


Statistics is a branch of mathematics dealing with data collection and organization,
analysis, interpretation, and presentation. Statistics can be divided into two categories:
Differential and Inferential Statistics. This field is related to mathematics and thus gives
a kickstart to Data Analysis career.

Q1. What do you understand by the term Normal Distribution?

This is one of the most important and widely used distributions in statistics. Commonly
known as the Bell Curve or Gaussian curve, normal distributions, measure how much
values can differ in their means and in their standard deviations. Refer to the below
image.

44 | P a g e
Fig 5: Normal Distribution – Data Analyst Interview Questions

As you can see in the above image, data is usually distributed around a central value
without any bias to the left or right side. Also, the random variables are distributed in the
form of a symmetrical bell-shaped curve.

Q2. What is A/B Testing?

45 | P a g e
A/B testing is the statistical hypothesis testing for a randomized experiment with two
variables A and B. Also known as the split testing, it is an analytical method that
estimates population parameters based on sample statistics. This test compares two
web pages by showing two variants A and B, to a similar number of visitors, and the
variant which gives better conversion rate wins.

The goal of A/B Testing is to identify if there are any changes to the web page. For
example, if you have a banner ad on which you have spent an ample amount of money.
Then, you can find out the return of investment i.e. the click rate through the banner
ad.  

Q3. What is the statistical power of sensitivity?

The statistical power of sensitivity is used to validate the accuracy of a classifier. This
classifier can be either Logistic Regression, Support Vector Machine, Random Forest
etc.

If I have to define sensitivity, then sensitivity is nothing but the ratio of Predicted True
Events to Total Events. Now, True Events are the events which were true and the
model also predicts them as true.

Fig 6: Seasonality Formula – Data Analyst Interview Questions

Q4. What is the Alternative Hypothesis?

To explain the Alternative Hypothesis, you can first explain what the null hypothesis is.
Null Hypothesis is a statistical phenomenon that is used to test for possible rejection
under the assumption that result of chance would be true.

After this, you can say that the alternative hypothesis is again a statistical phenomenon
which is contrary to the Null Hypothesis. Usually, it is considered that the observations
are a result of an effect with some chance of variation.

Q5. What is the difference between univariate, bivariate and multivariate


analysis?

The differences between univariate, bivariate and multivariate analysis are as follows:

46 | P a g e
 Univariate: A descriptive statistical technique that can be differentiated based on
the count of variables involved at a given instance of time.
 Bivariate: This analysis is used to find the difference between two variables at a
time.
 Multivariate: The study of more than two variables is nothing but multivariate
analysis. This analysis is used to understand the effect of variables on the
responses.

Q6. Can you tell me what are Eigenvectors and Eigenvalues?

Eigenvectors: Eigenvectors are basically used to understand linear transformations.


These are calculated for a correlation or a covariance matrix.

For definition purposes, you can say that Eigenvectors are the directions along which a
specific linear transformation acts either by flipping, compressing or stretching.

Eigenvalue: Eigenvalues can be referred to as the strength of the transformation or the


factor by which the compression occurs in the direction of eigenvectors.

Q7. What is the difference between 1-Sample T-test, and 2-Sample T-


test?

You can answer this question, by first explaining, what exactly T-tests are. Refer below
for an explanation of T-Test.

T-Tests are a type of hypothesis tests, by which you can compare means. Each test
that you perform on your sample data, brings down your sample data to a single value
i.e. T-value.  Refer below for the formula.

Fig 7: Formula to calculate t-value – Data Analyst Interview Questions

Now, to explain this formula, you can use the analogy of the signal-to-noise ratio, since
the formula is in a ratio format.

Here, the numerator would be a signal and the denominator would be the noise.

So, to calculate 1-Sample T-test, you have to subtract the null hypothesis value from the
sample mean. If your sample mean is equal to 7 and the null hypothesis value is 2, then
the signal would be equal to 5.

47 | P a g e
So, we can say that the difference between the sample mean and the null hypothesis is
directly proportional to the strength of the signal.

Now, if you observe the denominator which is the noise, in our case it is the measure of
variability known as the standard error of the mean. So, this basically indicates how
accurately your sample estimates the mean of the population or your complete dataset.

So, you can consider that noise is indirectly proportional to the precision of the sample.

Data Analytics Masters Program

Explore Curriculum

Now, the ratio between the signal-to-noise is how you can calculate the T-Test 1. So,
you can see how distinguishable your signal is from the noise.

To calculate, 2-Sample Test, you need to find out the ratio between the difference of the
two samples to the null hypothesis.

So, if I have to summarize for you, the 1-Sample T-test determines how a sample set
holds against a mean, while the 2-Sample T-test determines if the mean between 2
sample sets is really significant for the entire population or purely by chance.

Q8. What are different types of Hypothesis Testing?

The different types of hypothesis testing are as follows:

 T-test: T-test is used when the standard deviation is unknown and the sample
size is comparatively small.
 Chi-Square Test for Independence: These tests are used to find out the
significance of the association between categorical variables in the population
sample.
 Analysis of Variance (ANOVA): This kind of hypothesis testing is used to
analyze differences between the means in various groups. This test is often used
similarly to a T-test but, is used for more than two groups.
 Welch’s T-test: This test is used to find out the test for equality of means
between two population samples.

48 | P a g e
Q9. How to represent a Bayesian Network in the form of Markov Random
Fields (MRF)?

To represent a Bayesian Network in the form of Markov Random Fields, you can
consider the following examples:

Consider two variables which are connected through an edge in a Bayesian network,
then we can have a probability distribution that factorizes into a probability of A and then
the probability of B. Whereas, the same network if we mention in Markov Random Field,
it would be represented as a single potential function. Refer below:

Fig 7: Representation of Bayesian Network in MRF  – Data Analyst Interview Questions


Well, that was a simple example to start with.  Now, moving onto a complex example
where one variable is a parent of the other two. Here A is the parent variable and it
points down to B and C. In such a case, the probability distribution would be equal to
the probability of A and the conditional probability of B given A and C given A. Now, if
you have to convert this into Markov Random Field, the factorization of the similarly
structured graph, where we have the potential function of A/B edge and a potential
function for A/C edge. Refer to the image below.

49 | P a g e
Fig 8: Representation of Bayesian Network in MRF  – Data Analyst Interview Questions

Q10. What is the difference between variance and covariance?

Variance and Covariance are two mathematical terms which are used frequently in
statistics. Variance basically refers to how apart numbers are in relation to the mean.
Covariance, on the other hand, refers to how two random variables will change
together. This is basically used to calculate the correlation between variables.

In case you have attended any Data Analytics interview in the recent past, do paste
those interview questions in the comments section and we’ll answer them ASAP. You
can also comment below if you have any questions in your mind, which you might have
faced in your Data Analytics interview.

Now, let us move on to the next set of questions which is the SAS Interview Questions.

50 | P a g e
Data Analyst Interview
Questions: SAS
Statistical Analysis System(SAS) provided by SAS Institute itself is the most popular
Data Analytics tool in the market. In simple words, SAS can process complex data and
generate meaningful insights that would help organizations make better decisions or
predict possible outcomes in the near future. So, this lets you mine, alter, manage and
retrieve data from different sources and analyze it.

Q1. What is interleaving in SAS?


Interleaving in SAS means combining individual sorted SAS data sets into one sorted
data set. You can interleave data sets using a SET statement along with a BY
statement.

In the example that you can see below, the data sets are sorted by the variable Age.

Fig 9: Example for Interleaving in SAS  – Data Analyst Interview Questions

51 | P a g e
We can sort and then join the data sets on Age by writing the following query:

1 data combined;
2 set Data1, Data2;
3 by Age;
run;
4
Q2.  What is the basic syntax style of writing code in SAS?

The basic syntax style of writing code in SAS is as follows:

1. Write the DATA statement which will basically name the dataset.
2. Write the INPUT statement to name the variables in the data set.
3. All the statements should end with a semi-colon.
4. There should be a proper space between word and a statement.

Q3. What is the difference between the Do Index, Do While and the Do
Until loop? Give examples.

To answer this question, you can first answer what exactly a Do loop is. So, a Do loop is
used to execute a block of code repeatedly, based on a condition. You can refer to the
image below to see the workflow of the Do loop.

Fig 10: Workflow of Do Loop  – Data Analyst Interview Questions

 Do Index loop: We use an index variable as a start and stop value for Do Index
loop. The SAS statements get executed repeatedly till the index variable reaches
its final value.

52 | P a g e
 Do While Loop: The Do While loop uses a WHILE condition. This Loop
executes the block of code when the condition is true and keeps executing it, till
the condition becomes false. Once the condition becomes false, the loop is
terminated.
 Do Until Loop: The Do Until loop uses an Until condition. This Loop executes
the block of code when the condition is false and keeps executing it, till the
condition becomes true. Once the condition becomes true, the loop is terminated.

If you have to explain with respect to the code, then let us say we want to calculate the
SUM and the number of variables.

For the loops you can write the code as follows:

Do Index

1 DATA ExampleLoop;
2 SUM=0;
3 Do VAR = 1 = 10;
4 SUM = SUM + VAR;
5 END;
PROC PRINT DATA = ExampleLoop;
6 Run;
7
The output would be:

Obs SUM VAR


1 55 11
Table 2: Output of Do Index Loop  – Data Analyst Interview Questions

Do While

1
DATA ExampleLoop;
2 SUM = 0;
3 VAR = 1;
4 Do While(VAR<15);
5 SUM = SUM + VAR;
6 VAR+1;
END;
7 PROC PRINT DATA = ExampleLoop;
8 Run;
9
Obs SUM VAR

1 105 15
Table 3: Output of Do While Loop  – Data Analyst Interview Questions

53 | P a g e
Do Until

1
DATA ExampleLoop;
2 SUM = 0;
3 VAR = 1;
4 Do Until(VAR>15);
5 SUM=SUM+VAR;
6 VAR+1;
END;
7 PROC PRINT;
8 Run;
9
Obs SUM VAR

1 120 16
Table 4: Output of Do Until Loop  – Data Analyst Interview Questions

Q4. What is the ANYDIGIT function in SAS?

The ANYDIGIT function is used to search for a character string. After the string is found
it will simply return the desired string.

Q5. Can you tell the difference between VAR X1 – X3 and VAR X1 — X3?

When you specify sing dash between the variables, then that specifies consecutively
numbered variables. Similarly, if you specify the Double Dash between the variables,
then that would specify all the variables available within the dataset.

For Example:

Consider the following data set:

Data Set: ID NAME X1 X2 Y1 X3

Then, X1 – X3 would return X1 X2 X3

and X1 — X3 would return  X1 X2 Y1 X3

Q6. What is the purpose of trailing @ and @@? How do you use them?

The trailing @ is commonly known as the column pointer. So, when we use the trailing
@, in the Input statement, it gives you the ability to read a part of the raw data line, test
it and decide how can the additional data be read from the same record.

54 | P a g e
 The single trailing @ tells the SAS system to “hold the line”.
 The double trailing @@ tells the SAS system to “hold the line more strongly”.

An Input statement ending with @@ instructs the program to release the current raw
data line only when there are no data values left to be read from that line. The @@,
therefore, holds the input record even across multiple iterations of the data step.

Q7. What would be the result of the following SAS function (given that
31 Dec 2017 is Saturday)?

Weeks  = intck (‘week’,’31 dec 2017’d,’01jan2018’d);


Years    = intck (‘year’,’31 dec 2017’d,’01jan2018’d);
Months = intck (‘month’,’31 dec 2017’d,’01jan2018’d);

Here, we will calculate the weeks between 31st December 2017 and 1st January
2018. 31st December 2017 was a Saturday. So 1st January 2018 will be a Sunday in
the next week.

 Hence, Weeks = 1 since both the days are in different weeks.


 Years = 1 since both the days are in different calendar years.
 Months = 1 since both the days are in different months of the calendar.

Q8. How does PROC SQL work? 

PROC SQL is nothing but a simultaneous process for all the observations. The following
steps occur when a PROC SQL gets executed:

 SAS scans each and every statement in the SQL procedure and checks the
syntax errors.
 The SQL optimizer scans the query inside the statement. So, the SQL optimizer
basically decides how the SQL query should be executed in order to minimize
the runtime.
 If there are any tables in the FROM statement, then they are loaded into the data
engine where they can then be accessed in the memory.
 Codes and Calculations are executed.
 The Final Table is created in the memory.
 The Final Table is sent to the output table described in the SQL statement.

Q9. If you are given an unsorted data set, how will you read the last
observation to a new dataset?

We can read the last observation to a new dataset using end = dataset option.

For example:

1 data example.newdataset;

55 | P a g e
2 set example.olddataset end=last;
3 If last;
4 run;
Where newdataset is a new data set to be created and olddataset is the existing data
set. last is the temporary variable (initialized to 0) which is set to 1 when the set
statement reads the last observation.

Q10. What are the differences between the sum function and using “+”
operator?

The SUM function returns the sum of non-missing arguments whereas “+” operator


returns a missing value if any of the arguments are missing. Consider the following
example.

Example:

1
2 data exampledata1;
3 input a b c;
4 cards;
5 44 4 4
34 3 4
6 34 3 4
7 . 1 2
8 24 . 4
9 44 4 .
10 25 3 1
;
11 run;
12 data exampledata2;
13 set exampledata1;
14 x = sum(a,b,c);
y=a+b+c;
15
run;
16
17
In the output, the value of y is missing for 4th, 5th, and 6th observation as we have used
the “+” operator to calculate the value of y.

x y
52 52
41 41
41 41
3 .
28 .
48 .
29 29

56 | P a g e
57 | P a g e

You might also like