0% found this document useful (0 votes)
713 views

100 Python Interview Questions & Answer For Data Science

This document provides 100 Python interview questions and answers for data scientists to help them prepare for interviews. It covers common questions about Python fundamentals like modules, imports, virtual environments and memory management. It also compares lists and tuples in Python and explains lambda functions and the map function. Mastering the content can help elevate Python skills and land a dream data science job.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
713 views

100 Python Interview Questions & Answer For Data Science

This document provides 100 Python interview questions and answers for data scientists to help them prepare for interviews. It covers common questions about Python fundamentals like modules, imports, virtual environments and memory management. It also compares lists and tuples in Python and explains lambda functions and the map function. Mastering the content can help elevate Python skills and land a dream data science job.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

100 Python Interview Questions

& Answer For Data Science


100 Python Interview Question & Answers
For Data Science
Elevate your Python skills and land your dream data science job.

This comprehensive list covers 100 of the most commonly asked Python
interview questions, with detailed answers and explanations.

Whether you're a beginner or an experienced Pythonista, this will help you


prepare for your next data science interview.

Don't miss out on this essential resource for data science professionals.

1. What is Python and what are its advantages?


Python is a high-level, interpreted programming language that is widely used for a
variety of applications due to its simplicity , versatility , and vast library of
modules and packages. One of the main advantages of Python is its ease of use
and readability, which makes it accessible for beginners and allows for faster
development times. Additionally, Python's syntax is concise and clear, which reduces
the likelihood of errors and makes it easier to maintain code.

2. What is a Python module?


In Python, a module is a file that contains Python code, which can be imported and
used in other Python code. A package, on the other hand, is a collection of modules
that are organized into a directory hierarchy. This allows for more complex projects
to be easily managed and maintained. To create a module, you typically create a
Python script with a .py extension. For example, you could have a module named
my_module.py. Inside this file, you can define functions, classes, or other code that
you want to make available to other programs.

3. How do you import a module in Python?


To import a module in Python, you can use the import statement followed by the
name of the module. For example, import math . Here are the common import

100 Python Interview Questions & Answer For Data Science 1


statements:

Import the entire module:

import module_name This allows you to access the module's contents using the
module name as a prefix. For example, if you have a module named math, you can
use functions from that module like this: math.sqrt(25) .

Import specific items from a module:

from module_name import item_name

With this syntax, you can import specific functions, classes, or variables directly into
your code, without needing to use the module name as a prefix. For example: from
math import sqrt. Now you can directly use sqrt(25).

Import the entire module with a custom name:

import module_name as alias_name

This imports the entire module but assigns it a custom name (alias) that you specify.
This can be helpful if the module name is long or conflicts with another name in your
code. For example:

import math as m
#Now you can use m.
sqrt(25)

4. How do you create a Python virtual environment?


To create a Python virtual environment, you can use the venv module, which is
included with Python3. To create a virtual environment, run the command python -m
venv <name> in your terminal or command prompt, where <name> is the name of your

virtual environment.

5. How is memory managed in Python?

100 Python Interview Questions & Answer For Data Science 2


1. Memory management in python is managed by Python private heap space. All
Python objects and data structures are located in a private heap. The
programmer does not have access to this private heap. The python interpreter
takes care of this instead.

2. The allocation of heap space for Python objects is done by Python’s memory
manager. The core API gives access to some tools for the programmer to code.

3. Python also has an inbuilt garbage collector, which recycles all the unused
memory and so that it can be made available to the heap space

6. What is the difference between a Python list and a


tuple?
Features Lists Tuples

Tuples, on the other hand,


are immutable, meaning that
Lists are mutable, which once a tuple is created, you
means you can add , remove , cannot change its elements.
Mutability
or modify elements after the If you need to modify a tuple,
list is created. you would need to create a
new tuple with the desired
change
Lists are defined using while tuples are defined using
Syntax
square brackets [] parentheses () .

Lists are typically used for


collections of elements where
Tuples, on the other hand,
the order and individual
elements may change. They are often used for collections
of elements where the order
are commonly used for
Usage and values should not
sequences of data and when
change, such as coordinates ,
you need to perform
database records , or function
operations such as
arguments.
appending , extending , or
removing elements.

Tuples are generally


slightly more memory-
Lists, being mutable,
efficient and faster to
require additional memory
Performance access compared to lists.
allocation and support for
Since tuples are
dynamic resizing.
immutable, Python can
optimize them internally.

100 Python Interview Questions & Answer For Data Science 3


Features Lists Tuples

Lists support common


operations such as
indexing, slicing, and
iterating over elements.
Tuples support common
However, lists have additional
operations such as
Common Operations methods like append() ,
indexing, slicing, and
extend() , and remove() that
iterating over elements.
allow for in-place
modifications, which are not
available for tuples due to
their immutability.

7. What is a lambda function in Python?


In Python, a lambda function is a small anonymous function that can be defined
without a name. Lambda functions are typically used for short and simple operations
that can be defined inline in the code, without the need to create a separate function.
Lambda functions are defined using the lambda keyword, followed by the function's
arguments and the operation that the function should perform. The syntax for a
lambda function is as follows:

```
lambda arguments: expression`

```

For example, the following lambda function takes a number as input and returns its
square:

square = lambda x: x**2

8. What is the `map` function in Python?


The `map` function in Python applies a function to each element in a sequence and
returns a new sequence with the results. In Python, the map() function is a built-in
function that allows you to apply a given function to each element of an iterable
(such as a list , tuple , or string ) and returns an iterator that yields the results. It

100 Python Interview Questions & Answer For Data Science 4


provides a concise way to perform the same operation on every item in a collection
without writing explicit loops. The syntax for the map() function is as follows:

map(function, iterable)

Example:

9. What is the `filter` function in Python?


In Python, the filter() function is a built-in function that allows you to filter elements
from an iterable (such as a list, tuple, or string) based on a specified condition. It
returns an iterator that yields the elements from the iterable for which the condition
evaluates to True. The syntax for the filter() function is as follows:

filter(function, iterable)

The filter() function applies the provided function to each element of the iterable
and returns an iterator that yields the elements for which the condition is True.

100 Python Interview Questions & Answer For Data Science 5


10. What is the `reduce` function in Python?
In Python, the reduce() function is a part of the functools module and is used for
performing a cumulative computation on a sequence of elements. It applies a
specified function to the elements of an iterable in a cumulative way, reducing the
sequence to a single value. To use the reduce() function, you need to import it from
the functools module:

from functools import reduce

The syntax for the reduce() function is as follows:

reduce(function, iterable, initializer)

11. What is a generator in Python?


Generators in Python are a powerful feature that allows for the creation of iterators
that can be used to generate sequences of values.

This can be particularly useful when working with large datasets or when memory
constraints are a concern. By utilizing the yield keyword , generators can pause
execution and resume later, allowing for efficient and flexible processing of data.
In Python, a generator is a special type of iterator that generates values on the fly. It
allows you to write iterable objects by defining a function that uses the yield
keyword instead of return to provide values one at a time. Generators are memory-
efficient and provide a convenient way to work with large datasets or infinite
sequences. Here's an example of a simple generator function:

100 Python Interview Questions & Answer For Data Science 6


In this example, the count_up_to() function is a generator that generates numbers
from 0 up to a given n value. Instead of returning all the numbers at once, it yields
them one by one using the yield keyword. To use the generator and obtain its values,
you can iterate over it or use the next() function:

When the generator function encounters a yield statement, it temporarily suspends


its execution and returns the yielded value. The state of the generator function is
saved, allowing it to resume execution from where it left off the next time next() is
called.

12. What is pickling and unpickling in Python?


Pickling and unpickling are the processes of serializing and deserializing Python
objects, respectively. These processes allow you to convert complex objects into a
byte stream (serialization) and convert the byte stream back into the original object
(deserialization).

In Python, the pickle module provides functionality for pickling and unpickling
objects.

100 Python Interview Questions & Answer For Data Science 7


Pickling: Pickling is the process of converting a Python object into a byte stream,
which can be saved to a file, transmitted over a network, or stored in a database.
The pickle. dump() function is used to pickle an object by writing it to a file-like
object. The pickle. dumps() function is used to pickle an object and return the byte
stream without writing it to a file. Pickled objects can be saved with the file extension
.pickle or . pkl .

Unpickling: Unpickling is the process of restoring a pickled byte stream back into
the original Python object. The pickle. load() function is used to unpickle an
object from a file-like object. The pickle. loads() function is used to unpickle an
object from a byte stream. Unpickling reconstructs the original object with the same
state and data as it had before pickling.

13. What is the difference between a Python


generator and a list?
There are several differences between a Python generator and a list:

1. Memory Usage: Generators are memory-efficient because they generate values


on the fly as you iterate over them, whereas lists store all their values in memory
at once.

2. Computation: Generators provide values lazily, which means they generate the
next value only when requested. Lists, on the other hand, are computed eagerly,
meaning all their values are computed and stored upfront.

3. Iteration: Generators are iterable objects, and you can iterate over them using a
loop or other iterable constructs. However, once you iterate over a generator and
consume its values, they cannot be accessed again. Lists, on the other hand,
can be iterated over multiple times, and their values can be accessed at any
index.

4. Size: Lists have a fixed size, and you can access individual elements directly
using indexing. Generators do not have a fixed size, and you can only access
their elements sequentially by iterating over them.

5. Modifiability: Lists are mutable, which means you can modify, append, or
remove elements after the list is created. Generators, by design, are immutable
and do not support in-place modifications.

6. Creation: Lists are created by enclosing a sequence of elements within square


brackets [ ], while generators are created using generator functions or generator
expressions.

100 Python Interview Questions & Answer For Data Science 8


14. What is the difference between range and xrange
functions?
In Python 2.x, there were two built-in functions for generating sequences of
numbers: range() and xrange(). However, in Python 3.x, xrange() was removed and
range() became the only built-in function for generating sequences of numbers.

So, in Python 3.x, there is no difference between range() and xrange() because
xrange() no longer exists. In Python 2.x, the main difference between range() and
xrange() lies in how they generate and store sequences of numbers:

1. range(): The range() function returns a list containing all the numbers within the
specified range. For example, range(5) will return a list [0, 1, 2, 3, 4]. This means
that range() generates the entire sequence in memory, which can be memory-
intensive for large ranges.

2. xrange(): The xrange() function returns a generator object that generates


numbers on-the-fly as you iterate over it. It does not generate the entire
sequence in memory at once. Instead, it generates one number at a time, which
saves memory. This is particularly useful when dealing with large ranges
because you don't need to store the entire sequence in memory.

Here's an example in Python 2.x to illustrate the difference:

15. How does break, continue and pass work?


In Python, break, continue, and pass are control flow statements used to alter the
normal flow of execution within loops and conditional statements. Here's how each of
them works:

100 Python Interview Questions & Answer For Data Science 9


break statement: The break statement is used to terminate the execution of the
innermost loop (i.e., the loop in which it is encountered). When the break
statement is encountered, the loop is immediately exited, and the program
continues with the next statement after the loop. Here's an example:

continue statement: When the continue statement is encountered, the


remaining statements within the loop for that iteration are skipped, and the loop
proceeds with the next iteration. Here's an example:

pass statement: It doesn't do anything and acts as a placeholder. It is used


when you need a statement syntactically but don't want any code to be
executed. Here's an example:

16. How can you randomize the items of a list in place


in Python?
To randomize the items of a list in place (i.e., modifying the original list), you can
make use of the random.shuffle() function from the random module in Python. The

100 Python Interview Questions & Answer For Data Science 10


shuffle() function shuffles the elements of a list randomly.

Here's an example:

Output:

17. What is a Python iterator?


A Python iterator is an object that can be used to iterate over a sequence of values.
It provides a __next__() method that returns the next value in the sequence, and
raises a StopIteration exception when there are no more values.

Here's an example of a simple iterator:

100 Python Interview Questions & Answer For Data Science 11


18. How do you handle exceptions in Python?
In Python, exceptions are used to handle errors and exceptional situations that may
occur during the execution of a program. Exception handling allows you to gracefully
handle errors and control the flow of your program when an exception is raised. To
handle exceptions in Python, you can use a combination of the try, except, else, and
finally blocks.
Example:

100 Python Interview Questions & Answer For Data Science 12


Here's a breakdown of the different parts of exception handling:

1. try: The try block is where you put the code that may raise an exception. If an
exception occurs within this block, the execution jumps to the appropriate except
block.

2. except: The except block catches specific exceptions and provides the handling
code for each exception type. You can have multiple except blocks to handle
different types of exceptions.

3. else: The else block is optional and is executed if no exception occurs in the try
block.

4. finally: The finally block is optional and is always executed, regardless of


whether an exception occurred or not.

19. What is the difference between `finally` and `else`


in a Python `try`/`except` block?
In a Python try/except block, finally and else are optional clauses that serve different
purposes:
finally block: The finally block is always executed, regardless of whether an
exception occurred or not. It is typically used for cleanup operations or releasing
resources that need to be performed regardless of the outcome of the try block. The
finally block is executed even if an exception is raised and not caught by any of the

100 Python Interview Questions & Answer For Data Science 13


except blocks. It ensures that certain code is executed regardless of exceptions or
successful execution.
else block: The else block is executed only if the try block completes successfully
without any exceptions being raised. It is optional and provides a place to put code
that should be executed when no exceptions occur. If an exception is raised within
the try block, the code in the else block is skipped, and the program flow jumps to
the appropriate except block or propagates the exception up.

20. What is a list comprehension in Python?


List comprehension is a concise way to create lists in Python. It allows you to
generate a new list by specifying an expression, followed by one or more for and if
clauses.

The basic syntax of list comprehension is as follows:

# syntax of list comprehension


new_list = [expression for item in iterable if condition]

Here's a breakdown of the different parts:


1. expression: The expression to be evaluated for each item in the iterable.

2. item: A variable that represents each item in the iterable.

3. iterable: A sequence, such as a list, tuple, or string, that you want to iterate
over.

4. condition (optional): A condition that filters the items based on a Boolean


expression. Only items for which the condition evaluates to True are included in
the new list.

Here are a few examples to illustrate how list comprehension


works:
Example: Creating a new list of squares of numbers from 1 to 5:

100 Python Interview Questions & Answer For Data Science 14


21. What is a dictionary comprehension in Python?
Dictionary comprehension is a similar concept to list comprehension, but instead
of creating lists, it allows you to create dictionaries in a concise way. You can
generate a new dictionary by specifying key-value pairs using an expression and
one or more for and if clauses. The basic syntax of dictionary comprehension is as
follows:

new_dict = {key_expression: value_expression for item in iterable if condition}

Let's break down the different parts:


1. key_expression: The expression to determine the keys of the new dictionary.

2. value_expression: The expression to determine the values of the new


dictionary.

3. item: A variable representing each item in the iterable.

4. iterable: A sequence that you want to iterate over, such as a list, tuple, or string.

5. condition (optional): A condition that filters the items based on a Boolean


expression. Only items for which the condition evaluates to True are included in
the new dictionary.

Here are a few examples of dictionary comprehension:


Example: Creating a new dictionary of squares for numbers from 1 to 5:

22. What is the difference between a shallow copy


and a deep copy in Python?
A shallow copy and a deep copy are two different methods used to create copies of
objects, including lists, dictionaries, and custom objects. The main difference
between a shallow copy and a deep copy lies in how they handle nested objects or
references within the original object.

100 Python Interview Questions & Answer For Data Science 15


A shallow copy creates a new object but maintains references to the objects found in
the original object. In other words, it creates a new object and copies the references
to the nested objects in the original object.

If the original object contains mutable objects (e.g., lists or dictionaries), changes
made to the mutable objects in either the original or the copied object will affect both.

Example of Shallow Copy:

A deep copy creates a completely independent copy of the original object, including
all the nested objects. It recursively copies all objects found in the original object.
Changes made to the original object or its nested objects will not affect the deep
copy, and vice versa.

Example of deep copy:

100 Python Interview Questions & Answer For Data Science 16


23. How do you sort a list in Python?
To sort a list in Python, you can use the ` sorted() ` function, which returns a new
sorted list, or the ` sort() ` method, which sorts the list in-place. Both functions take
an optional ` key ` parameter, which is used to specify a function that returns a value
to use for sorting.

24. How do you reverse a list in Python?


To reverse a list in Python, you can use either the reverse() method or slicing .
Here's how you can use each method:

100 Python Interview Questions & Answer For Data Science 17


reverse() method: The reverse() method is a list method that reverses the order of
the elements in the list in place, meaning it modifies the original list.

Slicing: You can reverse a list using slicing by specifying the step value as -1, which
traverses the list in reverse order. This method returns a new reversed list without
modifying the original list.

25. How do you find the length of a list in Python?


To find the length of a list in Python, you can use the `len()` function, which returns
the number of elements in the list.

26. How do you concatenate two lists in Python?


To concatenate two lists in Python, you can use the + operator or the extend()
method. Both methods allow you to combine the elements of two lists into a single
list. Here's how you can use each method:
Using the + operator: The + operator concatenates two lists by creating a new list
that contains all the elements from both lists.

100 Python Interview Questions & Answer For Data Science 18


Using the extend() method: The extend() method modifies the original list by
appending all the elements from another list to the end of it.

27. How do you check if an element is in a list in


Python?
To check if an element is in a list in Python, you can use the in operator. The in
operator returns True if the element is found in the list and False otherwise.
Here's an example:

In this example, the element in my_list expression checks if element (which is set to
3) is present in my_list. Since 3 is in my_list, the condition is true, and the statement
"Element is in the list" is printed.

28. How do you remove an element from a list in


Python?
In Python, there are multiple ways to remove an element from a list. Here are a few
common methods:
remove() method: The remove() method removes the first occurrence of a specified
element from the list. If the element is not found, it raises a ValueError.

Here's an example:

100 Python Interview Questions & Answer For Data Science 19


del statement: The del statement is used to remove an element from a list by its
index. It can also be used to remove a slice of elements from a list.

Here's an example:

pop() method: The pop() method removes and returns an element from a list
based on its index. If no index is specified, it removes and returns the last element.
Here's an example:

29. How do you split a string into a list in Python?


To split a string into a list in Python, you can use the ` split() ` method, which splits a
string into a list of substrings based on a specified delimiter.
For example:

100 Python Interview Questions & Answer For Data Science 20


You can also specify a custom delimiter for splitting the string. For example, to split
the string based on commas, you can pass ',' as the delimiter to the split() method:

30. How do you join a list into a string in Python?


To join a list into a string in Python, you can use the `join()` method, which
concatenates the elements of a list using a specified separator string.

For example:

my_list = ['hello', 'world']

my_string = ' '.join(my_list)  # joins elements with a space separator

print(my_string)  # outputs "hello world"

my_list = ['1', '2', '3', '4', '5']

my_string = ','.join(my_list)  # joins elements with a comma separator

print(my_string)  # outputs "1,2,3,4,5"

31. How do you convert a string to a number in


Python?
To convert a string to a number in Python, you can use the ` int() ` or ` float() `
function, depending on the type of number you want to convert to.

For example:

my_string = "42"

my_int = int(my_string)  # converts to integer

100 Python Interview Questions & Answer For Data Science 21


print(my_int)  # outputs 42

my_string = "3.14"

my_float = float(my_string)  # converts to float

print(my_float)  # outputs 3.14

32. How do you convert a number to a string in


Python?
To convert a number to a string in Python, you can use the ` str() ` function, which
returns a string representation of the number.

For example:

my_int = 42

my_string = str(my_int)  # converts to string

print(my_string)  # outputs "42"

my_float = 3.14

my_string = str(my_float)  # converts to string

print(my_string)  # outputs "3.14"

33. How do you read input from the user in Python?


To read input from the user in Python, you can use the ` input() ` function, which
reads a line of text from the user and returns it as a string.

For example:

100 Python Interview Questions & Answer For Data Science 22


Note that the input() function always returns a string, even if the user enters a
number or other data type. If you need to convert the input to a different data type,
such as an integer or float, you can use appropriate conversion functions like int()

or float() .

34. How do you open a file in Python?


To open a file in Python, you can use the ` open() ` function, which returns a file
object. The function takes two arguments: the filename and the mode in which to
open the file.

For example:

file = open("example.txt", "r")  # opens file for reading

35. How do you read data from a file in Python?


To read data from a file in Python, you can use the ` read() ` method of the file object,
which reads the entire contents of the file as a string. Alternatively, you can use the
` readline() ` method to read one line of the file at a time.

For example:

file = open("example.txt", "r")


data = file.read()
print(data)
file.close()
# read one line at a time

file = open("example.txt", "r")


line = file.readline()
while line:

100 Python Interview Questions & Answer For Data Science 23


print(line)
line = file.readline()
file.close()

36. How do you write data to a file in Python?


To write data to a file in Python, you can use the ` write() ` method of the file object,
which writes a string to the file. Alternatively, you can use the ` writelines() ` method
to write a list of strings to the file.

For example:

# write a string to file

file = open("example.txt", "w")

file.write("Hello, world!\n")

file.close()

# write a list of strings to file

lines = ["This is the first line.\n", "This is the second line.\n", "This is the third
line.\n"]

file = open("example.txt", "w")

file.writelines(lines)

file.close()

37. How do you close a file in Python?


To close a file in Python, you can call the close() method of the file object.

For example:

file = open("example.txt", "r")

data = file.read()

file.close()

38. How do you check if a file exists in Python?

100 Python Interview Questions & Answer For Data Science 24


To check if a file exists in Python, you can use the ` os.path. isfile() ` function,
which returns `True` if the file exists and `False` otherwise.

For example:

import os.path

if os.path.isfile("example.txt"):

print("The file exists.")

else:

print("The file does not exist.")

39. How do you get the current working directory in


Python?
To get the current working directory in Python, you can use the
os. getcwd() `function, which returns a string representing the current working
directory.

For example:

import os

cwd = os.getcwd()

print(cwd)

40. How do you change the current working directory


in Python?
To change the current working directory in Python, you can use the ` os.chdir() `
function, which changes the current working directory to the specified path.

For example:

100 Python Interview Questions & Answer For Data Science 25


import os

os.chdir("/path/to/new/directory")

41. How do you get a list of files in a directory in


Python?
To get a list of files in a directory in Python, you can use the os.listdir() function,
which returns a list of filenames in the specified directory.

For example:

import os

files = os.listdir("/path/to/directory")

print(files)

42. How do you create a directory in Python?


To create a directory in Python, you can use the os. mkdir() function, which creates
a new directory with the specified name in the current working directory.

For example:

import os

os.mkdir("new_directory")

43. How do you remove a directory in Python?


To remove a directory in Python, you can use the os. rmdir() function, which
removes the directory with the specified name in the current working directory.

For example:

import os

os.rmdir("directory_to_remove")

100 Python Interview Questions & Answer For Data Science 26


44. What is a metaclass in Python?
A metaclass is a class that defines the behavior and structure of other classes. In
other words, a metaclass is the class of a class. It allows you to define how classes
should be created and what attributes and methods they should have.

The default metaclass is the type metaclass, which is responsible for creating and
defining the behavior of all classes. However, you can create your own metaclass by
subclassing type or using the __metaclass__ attribute in a class definition.

Metaclass provide a way to modify the behavior of class creation and allow you to
add or modify attributes, methods, or behavior for classes that are created using the
metaclass.

45. How do you define a metaclass in Python?


To define a metaclass in Python, you define a new class that inherits from the built-in
type class. The new class can define custom behavior for creating and initializing
classes.

For example:

class MyMeta(type):

def __new__(cls, name, bases, attrs):

# Custom class creation logic goes here

return super().__new__(cls, name, bases, attrs)

class MyClass(metaclass=MyMeta):

# Class definition goes here👇

46. How do you filter files by extension in Python?


To filter files by extension in Python, you can use a list comprehension to create a
new list that contains only the files with the specified extension.

For example: To filter all .txt files in a directory:

100 Python Interview Questions & Answer For Data Science 27


import os

files = os.listdir("/path/to/directory")

txt_files = [file for file in files if file.endswith(".txt")]

for file in txt_files:


print(file)

47. How do you read a file line by line in Python?


To read a file line by line in Python, you can use a ` for ` loop to iterate over the lines
of the file.

For example:

with open("file_to_read") as file:

for line in file:


print(line)

48. How do you read the contents of a file into a


string in Python?
To read the contents of a file into a string in Python, you can use the `read()` method
of the file object.

For example:

with open("file_to_read") as file:


contents = file.read()
print(contents)

49. How do you write a string to a file in Python?


To write a string to a file in Python, you can use the ` write() ` method of the file
object.

100 Python Interview Questions & Answer For Data Science 28


For example:

with open("file_to_write", "w") as file:


file.write("Hello, world!")

50. How do you append a string to a file in Python?


To append a string to a file in Python, you can open the file in append mode ("a")

and use the write() method of the file object.

For example:

with open("file_to_append", "a") as file:


file.write("Hello, world!")

51. Write a one-liner that will count the number of


capital letters in a file.
To count the number of capital letters in a file using a one-liner in Python, you can
combine file reading, character filtering, and counting using a generator expression.

Here's an example:

count = sum(1 for line in open('filename.txt') for char in line if char.isupper())

In the above code, 'filename.txt' represents the name or path of the file you want to
count the capital letters in.

The open() function is used to open the file, and the file is iterated line by line using
the first for loop (for line in open(' filename.txt ')) .

Then, for each line, the characters are iterated using the second for loop (for char in

line) .

The char. isupper() condition checks if the character is uppercase. The generator
expression 1 for line in open('filename.txt') for char in line if char. isupper()
generates 1 for each uppercase character.

Finally, the sum() function is used to add up all the 1 occurrences, resulting in the
count of capital letters, which is stored in the count variable.

100 Python Interview Questions & Answer For Data Science 29


52. What is NumPy? Why should we use it?
NumPy (also called Numerical Python) is a highly flexible, optimized, open-source
package meant for array processing. It provides tools for delivering high-end
performance while dealing with
N-dimensional powerful array objects.

It is also beneficial for performing scientific computations, mathematical, and logical


operations, sorting operations, I/O functions, basic statistical and linear algebra-
based operations along with random simulation and broadcasting functionalities.

Due to the vast range of capabilities, NumPy has become very popular and is the
most preferred package. The following image represents the uses of NumPy.

53. How are NumPy arrays better than Python’s lists?


Python lists support storing heterogeneous data types whereas NumPy arrays
can store datatypes of one nature itself. NumPy provides extra functional
capabilities that make operating on its arrays easier which makes NumPy array
advantageous in comparison to Python lists as those functions cannot be
operated on heterogeneous data.

NumPy arrays are treated as objects which results in minimal memory usage.
Since Python keeps track of objects by creating or deleting them based on the

100 Python Interview Questions & Answer For Data Science 30


requirements, NumPy objects are also treated the same way. This results in
lesser memory wastage.

NumPy arrays support multi-dimensional arrays.

NumPy provides various powerful and efficient functions for complex


computations on the arrays.

NumPy also provides various range of functions for BitWise Operations, String
Operations, Linear Algebraic operations, Arithmetic operations etc. These are
not provided on Python’s default lists.

54. What are ndarrays in NumPy?


ndarray object is the core of the NumPy package. It consists of n-dimensional arrays
storing elements of the same data types and also has many operations that are done
in compiled code for optimized performance. These arrays have fixed sizes defined
at the time of creation.

Following are some of the properties of ndarrays:

When the size of ndarrays is changed, it results in a new array and the original
array is deleted.

The ndarrays are bound to store homogeneous data.

They provide functions to perform advanced mathematical operations in an


efficient manner.

55. What are the ways for creating a 1D array?


In NumPy, there are several ways to create 1D, 2D, and 3D arrays. Here are some
common methods:

Using the array() function and providing a Python list or tuple:

import numpy as np

my_array = np.array([1, 2, 3, 4, 5])

Using the arange() function to generate a range of values:

import numpy as np

my_array = np.arange(1, 6)

100 Python Interview Questions & Answer For Data Science 31


Using the linspace() f unction to generate evenly spaced values:

import numpy as np

my_array = np.linspace(1, 5, 5)

56. How is np.mean() different from np.average() in


NumPy?
np.mean()method calculates the arithmetic mean and provides additional options for
input and results. For example, it has the option to specify what data types have to
be taken, where the result has to be placed etc.
np.average() computes the weighted average if the weights parameter is specified. In
the case of weighted average, instead of considering that each data point is
contributing equally to the final average, it considers that some data points have
more weightage than the others (unequal contribution).

57. How can you reverse a NumPy array?


To reverse a NumPy array, you can use the indexing and slicing feature of NumPy.
Here are two common approaches:

Using indexing and slicing: For a 1D array, you can use the [::-1] slicing to
reverse the array:

import numpy np

my_array = np.array([1, 2, 3, 4, 5])


reversed_array = my_array[::-1]
print (reversed_array)

Using the np.flip() function: The np.flip() function can be used to reverse an
array along a specified axis. By default, it reverses the array along all axes.
Here's an example:

import numpy np

my_array = np.array([1, 2, 3, 4, 5])


reversed_array = np.flip(my_array)
print (reversed_array)

100 Python Interview Questions & Answer For Data Science 32


58. How do you count the frequency of a given
positive value appearing in the NumPy array?
We can make use of the bincount() function to compute the number of times a given
value is there in the array. This function accepts only positive integers and Boolean
expressions as the arguments.

The np.bincount() function in NumPy is used to count the occurrences of non-


negative integers in an array and return the frequency of each integer.

It is particularly useful when dealing with discrete data or integer-valued data. The
function operates on 1D arrays and returns a new array with the count of
occurrences for each integer value.

Example:

import numpy as np

arr = np.array([1, 2, 3, 2, 4, 5, 2])


bin_counts = np.bincount(arr)

print(bin_counts)

Output:

[0 1 4 1 1 1 ]

100 Python Interview Questions & Answer For Data Science 33


59. What is Pandas in Python?
Pandas is an open-source Python package that is most commonly used for data
science, data analysis, and machine learning tasks. It is built on top of another
library named Numpy .
It provides various data structures and operations for manipulating numerical data
and time series and is very efficient in performing various functions like data
visualization , data manipulation , data analysis , etc.

60. Mention the different types of Data Structures in


Pandas?
Pandas have three different types of data structures. It is due to these simple and
flexible data structures that it is fast and efficient.

1. Series - It is a one-dimensional array-like structure with homogeneous data


which means data of different data types cannot be a part of the same series. It
can hold any data type such as integers, floats, and strings and its values are
mutable i.e. it can be changed but the size of the series is immutable i.e. it
cannot be changed.

2. DataFrame - It is a two-dimensional array-like structure with heterogeneous


data. It can contain data of different data types and the data is aligned in a
tabular manner. Both size and values of DataFrame are mutable.

3. Panel - The Pandas have a third type of data structure known as Panel, which is
a 3D data structure capable of storing heterogeneous data but it isn’t that widely

100 Python Interview Questions & Answer For Data Science 34


used.

61. What are the significant features of the pandas


Library?
Pandas library is known for its efficient data analysis and state-of-the-art data
visualization. The key features of the panda’s library are as follows:

Fast and efficient DataFrame object with default and customized indexing.

High-performance merging and joining of data.

Data alignment and integrated handling of missing data.

Label-based slicing, indexing, and subsetting of large data sets.

Reshaping and pivoting of data sets.

Tools for loading data into in-memory data objects from different file formats.

Columns from a data structure can be deleted or inserted.

Group by data for aggregation and transformations.

62. Define Series in Pandas?


It is a one-dimensional array-like structure with homogeneous data which means
data of different data types cannot be a part of the same series.

It can hold any data type such as integers, floats, and strings and its values are
mutable i.e. it can be changed but the size of the series is immutable i.e. it cannot be
changed.

By using a ‘ series ’ method, we can easily convert the list, tuple, and dictionary into a
series. A Series cannot contain multiple columns.

63. Define DataFrame in Pandas?


It is a two-dimensional array-like structure with heterogeneous data. It can contain
data of different data types and the data is aligned in a tabular manner i.e. in rows
and columns and the indexes with respect to these are called row index and column
index respectively.
Both size and values of DataFrame are mutable. The columns can be
heterogeneous types like int and bool. It can also be defined as a dictionary of
Series.

100 Python Interview Questions & Answer For Data Science 35


The syntax for creating a dataframe:

import pandas as pd
dataframe = pd.DataFrame( data, index, columns, dtype)

Parameter explanation Here:

data - It represents various forms like series , map , ndarray , lists , dict , etc.

index - It is an optional argument that represents an index to row labels.

columns - Optional argument for column labels.

Dtype - It represents the data type of each column. It is an optional parameter

64. What are the different ways in which a series can


be created in pandas?
In Pandas, there are several ways to create a Series, which is a one-dimensional
labeled array. Here are some common methods:

From a Python list: You can create a Series by passing a Python list to the
pd.Series() constructor:

From a NumPy array: You can create a Series from a NumPy array by passing
the array to the pd.Series() constructor:

From a dictionary: You can create a Series from a dictionary, where the keys of
the dictionary will be the index labels of the Series and the values will be the
data:

100 Python Interview Questions & Answer For Data Science 36


65. How can we create a copy of the series in
Pandas?
We can create a copy of the series by using the following syntax:
Series. copy(deep=True) The default value for the deep parameter is set to True.

When the value of deep= True , the creation of a new object with a copy of the calling
object’s data and indices takes place.

Modifications to the data or indices of the copy will not be reflected in the original
object whereas when the value of deep= False , the creation of a new object will take
place without copying the calling object’s data or index i.e. only the references to the
data and index will be copied.

Any changes made to the data of the original object will be reflected in the shallow
copy and vice versa.

66. Explain Categorical data in Pandas?


Categorical data is a discrete set of values for a particular outcome and has a fixed
range. Also, the data in the category need not be numerical, it can be textual in
nature.

Examples are gender, social class, blood type, country affiliation, observation
time, etc. There is no hard and fast rule for how many values a categorical value
should have. One should apply one’s domain knowledge to make that determination
on the data sets

67. How to Read Text Files with Pandas?


There are multiple ways in which we read a text file using Pandas.

Using read_csv(): CSV is a comma-separated file i.e. any text file that uses
commas as a delimiter to separate the record values for each field. Therefore, in
order to load data from a text file we use pandas.read_csv() method.

Using read_table(): This function is very much like the read_csv() function, the
major difference being that in read_table the delimiter value is ‘\t’ and not a

100 Python Interview Questions & Answer For Data Science 37


comma which is the default value for read_csv(). We will read data with the
read_table function making the separator equal to a single space(‘ ‘) .

Using read_fwf(): It stands for fixed-width lines. This function is used to load
DataFrames from files. Another very interesting feature is that it supports
optionally iterating or breaking the file into chunks. Since the columns in the text
file were separated with a fixed width, this read_fwf() read the contents
effectively into separate columns.

68. How are iloc() and loc() different?


The iloc() and loc() functions in Pandas are used to access and retrieve data
from a DataFrame or Series. However, they have some differences in terms of the
indexing methods they use. Here's how they differ:

iloc(): It allows you to access data by specifying the integer-based positions of rows
and columns. The indexing starts from 0 for both rows and columns. You can use
integer-based slicing and indexing ranges to select specific rows or columns. The
iloc() function does not include the end value when slicing with ranges. Here's an
example to illustrate the usage of iloc():

loc(): The loc() function is primarily used for label-based indexing. It allows you to
access data by specifying labels or Boolean conditions for rows and column names.
You can use label-based slicing and indexing ranges to select specific rows or
columns.

The loc() function includes the end value when slicing with ranges. Here's an
example to illustrate the usage of loc():

100 Python Interview Questions & Answer For Data Science 38


In summary, iloc() is used for integer-based indexing, while loc() is used for label-
based indexing. The choice between iloc() and loc() depends on whether you want
to access data based on integer positions or label names.

69. How would you convert continuous values into


discrete values in Pandas?
To convert continuous values into discrete values in Pandas, you can use the
pd.cut() function.

The pd.cut() function allows you to divide a continuous variable into bins and assign
discrete labels to the values based on their bin membership.

Example: How you can use pd.cut() to convert continuous values into discrete
categories:

100 Python Interview Questions & Answer For Data Science 39


Output:

70. What’s the difference between interpolate() and


fillna() in Pandas?
Both the interpolate() and fillna() functions are used to fill missing or NaN (Not a
Number) values in a DataFrame or Series. However, they differ in their approach to
filling the missing values:
interpolate():  It is primarily used for filling missing values in time series or other
ordered data where the values are expected to have a smooth variation. The
function estimates the missing values based on the values of neighboring data
points, using various interpolation methods such as linear, polynomial, spline, etc.

100 Python Interview Questions & Answer For Data Science 40


Output:

fillna(): The fillna() function in Pandas is used to fill missing values with a specified
scalar value or with values from another DataFrame or Series. The function replaces
the missing values with the provided scalar value or with values from a specified
Series or DataFrame.

Here's an example of using fillna() to fill missing values with a constant value:

Output;

100 Python Interview Questions & Answer For Data Science 41


71. How to add a row to a Pandas DataFrame?
To add a row to a Pandas DataFrame, you can use the append() function or the
loc[] indexing method. Here are examples of both approaches:
Using append() function: The append() function is used to concatenate rows or
DataFrames together. You can create a new DataFrame representing the row you
want to add, and then append it to the original DataFrame.

Output:

Using loc[] indexing: Another approach is to use the loc[] indexing method to
directly assign values to a new row.

100 Python Interview Questions & Answer For Data Science 42


Output:

72. Write a Pandas program to find the positions of


numbers that are multiples of 5 of a given series
To find the positions of numbers that are multiples of 5 in a given pandas Series, you
can use the numpy.where() function along with boolean indexing.

Here's an example program:

100 Python Interview Questions & Answer For Data Science 43


Output:

In the above program, we create a sample pandas Series named series containing
some numbers. We then use the % operator to check for multiples of 5 by applying
the condition series % 5 == 0 .

This condition returns a Boolean Series with True values where the numbers are
multiples of 5 and False values otherwise. Next, we use numpy.where() along with
Boolean indexing ([0]) to retrieve the positions of True values in the Boolean
Series.

The result is an array of positions where the numbers are multiples of 5.

73. Write a Pandas program to display the most


frequent value in a given series and replace
everything else as “replaced” in the series.
To display the most frequent value in a given pandas Series and replace everything
else with "replaced", you can use the value_counts() function to find the most
frequent value and then use the replace() function to replace the remaining values.

Here's an example program:

100 Python Interview Questions & Answer For Data Science 44


Output:

In the above program, we create a sample pandas Series named series with some
values. We use the value_counts() function to count the occurrences of each value in
the Series and then retrieve the most frequent value using idxmax().

The idxmax() function returns the index label of the maximum value, which
corresponds to the most frequent value in this case. Next, we use Boolean indexing
(series != most_frequent) to create a mask of values that are not equal to the most

frequent value.

We use this mask to select those values from the Series and replace them with
"replaced" using the replace() function.

100 Python Interview Questions & Answer For Data Science 45


Finally, we print the Series with the most frequent value replaced as "replaced".

74. Write a Python program that removes vowels


from a string.

Output:

75. Write a Python program that rotates an array by


two positions to the right.

Output:

100 Python Interview Questions & Answer For Data Science 46


76. Write a Python code to find all nonrepeating
characters in the String

Output:

77. Write a Python program to calculate the power


using ‘while-loop’

100 Python Interview Questions & Answer For Data Science 47


Output:

78. Write a program to check and return the pairs of a


given array A whose sum value is equal to a target
value N.

100 Python Interview Questions & Answer For Data Science 48


Output:

79. Write a Program to match a string that has the


letter ‘a’ followed by 4 to 8 'b’s.

100 Python Interview Questions & Answer For Data Science 49


Output:

80. Write a Program to convert date from yyyy-mm-dd


format to dd-mm-yyyy format using regular
expression.

Output:

100 Python Interview Questions & Answer For Data Science 50


81. Write a Program to combine two different
dictionaries. While combining, if you find the same
keys, you can add the values of these same keys.
Output the new dictionary

Output:

100 Python Interview Questions & Answer For Data Science 51


82. What is matplotlib and explain its features.
Matplotlib is a widely-used plotting library in Python that provides a variety of high-
quality 2D and limited 3D visualizations. It is capable of creating a wide range of
static, animated, and interactive plots for data analysis, exploration, and presentation
purposes.

Features of Matplotlib:
1. Plotting Functions: Matplotlib provides a comprehensive set of plotting
functions that allow you to create various types of plots, including line plots,
scatter plots, bar plots, histogram plots, pie charts, and more.

2. Object-Oriented API: Matplotlib has an object-oriented API that allows fine-


grained control over plot elements. You can create and manipulate individual plot
elements, such as figures, axes, lines, and markers, giving you more flexibility
and control over the plot layout and appearance.

3. Integration with NumPy: Matplotlib seamlessly integrates with the NumPy


library, enabling you to plot data stored in NumPy arrays efficiently. This
integration allows you to visualize and analyze numerical data easily.

4. Customization and Styling: Matplotlib provides extensive customization


options to control the appearance of your plots. You can customize various
aspects, such as axes limits, tick marks, labels, grid lines, legends, and color
schemes. Matplotlib also supports the use of style sheets, allowing you to
quickly apply predefined styles or create your own custom styles.

83. Can you provide me examples of when a scatter


graph would be more appropriate than a line chart or
vice versa?

100 Python Interview Questions & Answer For Data Science 52


A scatter graph would be more appropriate than a line chart when you are looking to
show the relationship between two variables that are not linearly related.

For example, if you were looking to show the relationship between a person’s age
and their weight, a scatter graph would be more appropriate than a line chart. A line
chart would be more appropriate than a scatter graph when you are looking to show
a trend over time.

For example, if you were looking at the monthly sales of a company over the course
of a year, a line chart would be more appropriate than a scatter graph.

84. How do you customize the appearance of your


plots in matplotlib?
In Matplotlib, you can customize the appearance of your plots in various ways. Here
are some common customization options:

1. Titles and Labels: Set the title of the plot using plt.title() or ax.set_title(). Set
labels for the x-axis and y-axis using plt. xlabel() and plt. ylabel() or
ax. set_xlabel() and ax. set_ylabel() .

2. Legends: Add a legend to your plot using plt. legend() or ax. legend() .
Customize the legend location, labels, and other properties.

3. Grid Lines: Display grid lines on the plot using plt.grid(True) or ax.grid(True).
Customize the grid appearance with options like linestyle , linewidth , and
color .

4. Colors, Line Styles, and Markers: Control the colors of lines, markers, and
other plot elements using the color parameter in plotting functions. Customize
line styles (e.g., solid, dashed, dotted) using the linestyle parameter. Specify
markers (e.g., dots, triangles, squares) using the marker parameter.

5. Axis Limits and Ticks: Set custom axis limits using plt. xlim() and
plt. ylim() or ax. set_xlim() and ax. set_ylim() .

Customize the appearance of ticks on the x-axis and y-axis using plt. xticks()

and plt. yticks() or ax. set_xticks() and ax. set_yticks() .

6. Background and Plot Styles: Change the background color of the plot using
plt. figure(facecolor='color') or ax. set_facecolor('color') .

Apply predefined styles or create custom styles using


plt.style. use('style_name') or plt. style.context('style_name') .

100 Python Interview Questions & Answer For Data Science 53


7. Annotations and Text: Add annotations and text to your plot using plt. text()
or ax. text() . Customize the font size, color, and other properties of the text.

85. How to create a histogram plot in matplotlib ?


To create a histogram plot in Matplotlib, you can use the plt. hist() function. Here's
an example that demonstrates how to create a histogram plot:

Output:

100 Python Interview Questions & Answer For Data Science 54


86. How do you create a figure with multiple subplots
using Matplotlib?
To create a figure with multiple subplots using Matplotlib, you can use the
plt. subplots() function. Here's an example that demonstrates how to create a

figure with two subplots side by side:

100 Python Interview Questions & Answer For Data Science 55


Output:

In this example, We import the necessary modules, including matplotlib. pyplot as


plt and numpy as np. We generate sample data using the np. linspace() function to
create an array of values for the x-axis and the np.sin() and np.cos() functions to
compute the corresponding y-values.

100 Python Interview Questions & Answer For Data Science 56


We create a figure with two subplots using plt. subplots ( 1, 2, figsize=(10, 4) ) .
The arguments (1, 2) specify that we want one row and two columns of subplots,
and figsize=(10, 4) sets the size of the figure.

We plot the data on each subplot using the respective axes objects. Customizations
such as titles, labels, and legends are set using methods like set_title() ,
set_xlabel() , set_ylabel() , and legend() .
We use plt. tight_layout() to adjust the spacing between the subplots for better
visualization.

Finally, we use plt. show() to display the figure with the subplots.

87. What is the difference between Seaborn and


Matplotlib?
Seaborn is built on top of Matplotlib and provides a higher-level interface for creating
statistical graphics. While Matplotlib offers more flexibility and control over the plot
elements. Seaborn simplifies the creation of common statistical plots by providing
intuitive functions and sensible default settings. Seaborn also integrates well with
Pandas data structures.

88. How to create a heatmap in Seaborn?


To create a heatmap in Seaborn, you can use the heatmap() function. Here's an
example that demonstrates how to create a heatmap:

100 Python Interview Questions & Answer For Data Science 57


Output:

In this example, We import the necessary modules, including seaborn as sns and
numpy as np. We create a 2D array of random values using np.random.rand() .

This will serve as our data for the heatmap. We use the sns.heatmap() function to
create the heatmap. The data array is passed as the first argument.

100 Python Interview Questions & Answer For Data Science 58


Additional parameters can be used to customize the appearance of the heatmap. In
this example, annot=True enables the display of data values on the heatmap, and
cmap='YlGnBu' sets the color map.

Finally, we use plt.show() to display the heatmap.

89. How to create a catplot in Seaborn?


To create a categorical plot (catplot) in Seaborn, you can use the catplot()
function. This function provides a high-level interface for creating various types of
categorical plots. Here's an example that demonstrates how to use catplot() :

Output:

100 Python Interview Questions & Answer For Data Science 59


We import the necessary modules, including seaborn as sns. We load the built-in
'tips' dataset from Seaborn using the sns. load_dataset() function.

This dataset contains information about restaurant tips. We use the sns. catplot()

function to create a categorical plot.

The x parameter specifies the variable to be plotted on the x-axis ('day' in this
example), the y parameter specifies the variable to be plotted on the y-axis
('total_bill' in this example), the data parameter specifies the dataset to use (the
'tips' dataset in this example), and the kind parameter specifies the type of
categorical plot to create ('box' plot in this example).

Finally, we use plt. show() to display the categorical plot.

90. How to create a distplot in Seaborn?


The distplot() function in Seaborn is used to create a distribution plot, which
displays the distribution of a univariate set of observations.

100 Python Interview Questions & Answer For Data Science 60


Here's an example that demonstrates how to create a
distribution plot using distplot() :

Output:

In this example, We import the necessary modules, including seaborn as sns and
numpy as np . We generate a random dataset using np.random. randn() .

100 Python Interview Questions & Answer For Data Science 61


We use the sns. distplot() function to create the distribution plot. The data array is
passed as the first argument. Additional parameters can be used to customize the
appearance of the plot. In this example, kde= True enables the kernel density
estimation line, and hist= True enables the histogram representation.

Finally, we use plt. show() to display the distribution plot. When you run this code, it
will create a distribution plot based on the provided random dataset.

The plot will show the estimated probability density function (PDF) using a kernel
density estimation (KDE) line, as well as a histogram representation of the data.

91. You have a time series dataset, and you want to


visualize the trend and seasonality in the data using
Matplotlib. What type of plot would you use, and how
would you create it?
In this scenario, I would use a line plot to visualize the trend and seasonality in the
time series data. To create the line plot in Matplotlib, I would import the necessary
libraries, create a figure and axes object, and then use the plot function to plot the
data points connected by lines.

92. You have a dataset with a single continuous


variable, and you want to visualize its distribution.
Would you choose a histogram or a box plot, and
why?
In this scenario, I would choose a histogram to visualize the distribution of the
continuous variable. A histogram provides information about the frequency
distribution and the shape of the data. It shows the spread of values and allows us to
identify any outliers or patterns.

On the other hand, a box plot provides a summary of the distribution, including
measures like the median, quartiles, and potential outliers, but it doesn't show the
detailed distribution of the data.

93. You have a dataset with two continuous variables,


and you want to visualize their joint distribution and
the individual distributions of each variable. Would

100 Python Interview Questions & Answer For Data Science 62


you choose a joint plot or a pair plot in Seaborn, and
why?
In this case, I would choose a pair plot in Seaborn to visualize the joint distribution
and individual distributions of the two continuous variables.
A pair plot creates a grid of subplots, where each subplot shows the relationship
between two variables through scatter plots and the distribution of each variable
using histograms or kernel density plots.

It allows us to explore the pairwise relationships between variables and gain insights
into their individual distributions. A joint plot, on the other hand, focuses on
visualizing the joint distribution and relationship between two variables in a single
plot.

94. How to create a pie chart in matplotlib?

Output:

100 Python Interview Questions & Answer For Data Science 63


In this example, we have a pie chart with four slices represented by the values in the
sizes list. Each slice is labeled with the corresponding label from the labels list.
The colors for each slice are defined in the colors list. We explode the second slice
(B) by specifying explode=(0, 0.1, 0, 0) , causing it to be slightly separated from the
rest.

The autopct='%1.1f%%' formats the percentage values displayed on each slice. The
startangle=90 rotates the chart to start from the 90-degree angle (top). After creating

the chart and adding a title, we display the pie chart using plt. show()

95. What is a violin plot in seaborn and how to create


it?
A violin plot in Seaborn is a data visualization that combines elements of a box
plot and a kernel density plot. It is used to visualize the distribution and density of a
continuous variable across different categories or groups.

The violin plot gets its name from its shape, which resembles that of a violin or a
mirrored density plot. The width of each violin represents the density or frequency of
data points at different values.

The plot is mirrored at the center, indicating the symmetry of the distribution. The
thick black line in the middle represents the median.

100 Python Interview Questions & Answer For Data Science 64


The white dot inside the violin represents the mean (optional). The thinner lines,
called “whiskers“. extend from the violin to indicate the range of the data.
Optionally, individual data points can be displayed using small points or a strip plot.
Example:

Output:

100 Python Interview Questions & Answer For Data Science 65


In this example, we use the tips dataset provided by Seaborn. We create a violin plot
using sns. violinplot() , specifying the x and y variables to visualize.
In this case, we plot the "total_bill" variable on the y-axis and group it by the "day"
variable on the x-axis. After creating the plot, we add labels and a title using
plt. xlabel() , plt. ylabel() , and plt. title() . Finally, we display the plot using

plt. show() .

96. What is a joint plot and where is it used?


A joint plot in Seaborn is a visualization that combines multiple univariate and
bivariate plots to explore the relationship between two variables. It is used to
visualize the joint distribution, individual distributions, and the correlation between
two continuous variables in a single plot.

The key features of a joint plot include:

1. Scatter plot: It displays the joint distribution of the two variables using a scatter
plot, where each data point is represented by a marker on a 2D plane.

2. Histograms: It shows the marginal distribution of each variable along the x and
y axes using histograms. These histograms represent the frequency or count of
the variable values.

100 Python Interview Questions & Answer For Data Science 66


3. Kernel density estimation (KDE) plot: It provides a smooth estimate of the
joint distribution using kernel density estimation, which is a non-parametric
technique to estimate the probability density function of a random variable.

The joint plot helps to visualize the relationship between two variables, identify
patterns, clusters, and potential outliers, and understand their individual distributions.
It also provides a visual representation of the correlation between the variables,
allowing for insights into their dependency. To create a joint plot in Seaborn, you can
use the sns. jointplot() function.

Joint plots are particularly useful when analyzing the relationship between two
continuous variables and gaining insights into their individual and joint distributions.
They provide a comprehensive view of the data in a single plot, facilitating
exploratory data analysis and hypothesis testing.

97. What are the conditions where heat maps are


used?
Heatmaps are commonly used in various scenarios to visualize and analyze data.
Here are some conditions and use cases where heatmaps are often employed:

1. Correlation Analysis: Heatmaps are widely used to visualize the correlation


between variables in a dataset. Each cell in the heatmap represents the
correlation coefficient between two variables, and the color intensity or shade
represents the strength of the correlation.

2. Confusion Matrix: Heatmaps are used to display confusion matrices,


particularly in machine learning classification tasks. Each cell in the heatmap
represents the count or percentage of correct and incorrect predictions for
different classes.

3. Geographic Data: Heatmaps are useful for visualizing geographic or spatial


data. By mapping data onto a geographical region, such as a map, heatmaps
can represent the intensity or density of a variable across different regions.

4. Performance Monitoring: Heatmaps can be employed to monitor and analyze


performance metrics over time or across different dimensions. For example, in
web analytics, a heatmap can represent user engagement or click-through rates
for different website sections or page elements.

5. Gene Expression Analysis: In genomics, heatmaps are widely used to analyze


gene expression data. Heatmaps display the expression levels of different genes
across multiple samples or conditions. This helps identify patterns, clusters, or

100 Python Interview Questions & Answer For Data Science 67


specific gene expression profiles related to certain biological conditions or
treatments.

6. Financial Analysis: Heatmaps find applications in financial analysis to visualize


market data, such as stock price movements or correlation between different
assets.

98. You have a dataset with multiple categories, and


you want to compare their proportions. Would you
choose a bar chart or a pie chart, and why?
When comparing the proportions of multiple categories, I would choose a bar chart
over a pie chart. A bar chart is better suited for comparing and ranking different
categories based on their values.

It provides a clear visual representation of the magnitude of each category, making it


easier to interpret the differences between them.

A pie chart, on the other hand, is useful for displaying the proportions of a single
categorical variable but can be less effective when comparing multiple categories.

99. How to create an area plot in matplotlib?


To create an area plot in Matplotlib, you can use the fill_between() function to fill the
area between two curves.

Here's an example of how to create an area plot:

100 Python Interview Questions & Answer For Data Science 68


Output:

100 Python Interview Questions & Answer For Data Science 69


In this example, we have two sets of data represented by values1 and values2 for
each month specified in the months list.

We create a figure and axes object using plt. subplots() . Then, we use the
fill_between() function twice to fill the area between the curves formed by values1

and values2. To customize the appearance of the area plot, you can specify the
colors using the color parameter and adjust the transparency with the alpha
parameter.

100. What is a regplot in seaborn and where is it


used?
A regplot is a function in Seaborn that allows you to create a scatter plot with a linear
regression line fit to the data. It is used to visualize the relationship between two
continuous variables and estimate the linear trend between them.

The regplot function in Seaborn combines the scatter plot and the linear regression
line in a single plot. It helps you understand the correlation and strength of the linear
relationship between the variables, as well as identify any potential outliers or
deviations from the trend line. Here are some key features of a regplot:

100 Python Interview Questions & Answer For Data Science 70


1. Scatter Plot: The regplot function creates a scatter plot by plotting the data
points of the two variables on a Cartesian coordinate system.

2. Linear Regression Line: It fits a regression line to the scatter plot using the
least squares method. The regression line represents the best-fit line that
minimizes the squared differences between the predicted and actual values.

3. Confidence Interval: By default, regplot also displays a shaded confidence


interval around the regression line. The confidence interval provides an estimate
of the uncertainty of the regression line.

4. Residual Plot: Additionally, regplot can display a separate plot showing the
residuals, which are the differences between the observed and predicted values.
This plot helps identify any patterns or heteroscedasticity in the residuals.

Regplots are commonly used in exploratory data analysis and data visualization
tasks to understand the relationship between two continuous variables.
They are helpful for detecting trends, outliers, and deviations from the linear
relationship. Regplots are often used in various fields, including statistics, social
sciences, economics, and machine learning, whenever analyzing the relationship
between variables is of interest.

To create a regplot in Seaborn, you can use the sns.regplot() function.

I hope this helps! 😀

Follow Aspersh Upadhyay for more related


to Data Science. Read my free articles on
Medium.

Get free learning resources on Bits of Data


Science related to data science, Artificial
Intelligence, Machine Learning, Python
and more.

100 Python Interview Questions & Answer For Data Science 71


Bits of Data science
Welcome Data Pioneers. Here you will get all possible
resources related to AI | ML | DL | PYTHON | and more.
Insta: instagram.com/dswithaspersh/ Medium:
https://t.me/bitsofds

100 Python Interview Questions & Answer For Data Science 72

You might also like