100 Python Interview Questions & Answer For Data Science
100 Python Interview Questions & Answer For Data Science
This comprehensive list covers 100 of the most commonly asked Python
interview questions, with detailed answers and explanations.
Don't miss out on this essential resource for data science professionals.
import module_name This allows you to access the module's contents using the
module name as a prefix. For example, if you have a module named math, you can
use functions from that module like this: math.sqrt(25) .
With this syntax, you can import specific functions, classes, or variables directly into
your code, without needing to use the module name as a prefix. For example: from
math import sqrt. Now you can directly use sqrt(25).
This imports the entire module but assigns it a custom name (alias) that you specify.
This can be helpful if the module name is long or conflicts with another name in your
code. For example:
import math as m
#Now you can use m.
sqrt(25)
virtual environment.
2. The allocation of heap space for Python objects is done by Python’s memory
manager. The core API gives access to some tools for the programmer to code.
3. Python also has an inbuilt garbage collector, which recycles all the unused
memory and so that it can be made available to the heap space
```
lambda arguments: expression`
```
For example, the following lambda function takes a number as input and returns its
square:
map(function, iterable)
Example:
filter(function, iterable)
The filter() function applies the provided function to each element of the iterable
and returns an iterator that yields the elements for which the condition is True.
This can be particularly useful when working with large datasets or when memory
constraints are a concern. By utilizing the yield keyword , generators can pause
execution and resume later, allowing for efficient and flexible processing of data.
In Python, a generator is a special type of iterator that generates values on the fly. It
allows you to write iterable objects by defining a function that uses the yield
keyword instead of return to provide values one at a time. Generators are memory-
efficient and provide a convenient way to work with large datasets or infinite
sequences. Here's an example of a simple generator function:
In Python, the pickle module provides functionality for pickling and unpickling
objects.
Unpickling: Unpickling is the process of restoring a pickled byte stream back into
the original Python object. The pickle. load() function is used to unpickle an
object from a file-like object. The pickle. loads() function is used to unpickle an
object from a byte stream. Unpickling reconstructs the original object with the same
state and data as it had before pickling.
2. Computation: Generators provide values lazily, which means they generate the
next value only when requested. Lists, on the other hand, are computed eagerly,
meaning all their values are computed and stored upfront.
3. Iteration: Generators are iterable objects, and you can iterate over them using a
loop or other iterable constructs. However, once you iterate over a generator and
consume its values, they cannot be accessed again. Lists, on the other hand,
can be iterated over multiple times, and their values can be accessed at any
index.
4. Size: Lists have a fixed size, and you can access individual elements directly
using indexing. Generators do not have a fixed size, and you can only access
their elements sequentially by iterating over them.
5. Modifiability: Lists are mutable, which means you can modify, append, or
remove elements after the list is created. Generators, by design, are immutable
and do not support in-place modifications.
So, in Python 3.x, there is no difference between range() and xrange() because
xrange() no longer exists. In Python 2.x, the main difference between range() and
xrange() lies in how they generate and store sequences of numbers:
1. range(): The range() function returns a list containing all the numbers within the
specified range. For example, range(5) will return a list [0, 1, 2, 3, 4]. This means
that range() generates the entire sequence in memory, which can be memory-
intensive for large ranges.
Here's an example:
Output:
1. try: The try block is where you put the code that may raise an exception. If an
exception occurs within this block, the execution jumps to the appropriate except
block.
2. except: The except block catches specific exceptions and provides the handling
code for each exception type. You can have multiple except blocks to handle
different types of exceptions.
3. else: The else block is optional and is executed if no exception occurs in the try
block.
3. iterable: A sequence, such as a list, tuple, or string, that you want to iterate
over.
4. iterable: A sequence that you want to iterate over, such as a list, tuple, or string.
If the original object contains mutable objects (e.g., lists or dictionaries), changes
made to the mutable objects in either the original or the copied object will affect both.
A deep copy creates a completely independent copy of the original object, including
all the nested objects. It recursively copies all objects found in the original object.
Changes made to the original object or its nested objects will not affect the deep
copy, and vice versa.
Slicing: You can reverse a list using slicing by specifying the step value as -1, which
traverses the list in reverse order. This method returns a new reversed list without
modifying the original list.
In this example, the element in my_list expression checks if element (which is set to
3) is present in my_list. Since 3 is in my_list, the condition is true, and the statement
"Element is in the list" is printed.
Here's an example:
Here's an example:
pop() method: The pop() method removes and returns an element from a list
based on its index. If no index is specified, it removes and returns the last element.
Here's an example:
For example:
For example:
my_string = "42"
my_string = "3.14"
For example:
my_int = 42
my_float = 3.14
For example:
or float() .
For example:
For example:
For example:
file.write("Hello, world!\n")
file.close()
lines = ["This is the first line.\n", "This is the second line.\n", "This is the third
line.\n"]
file.writelines(lines)
file.close()
For example:
data = file.read()
file.close()
For example:
import os.path
if os.path.isfile("example.txt"):
else:
For example:
import os
cwd = os.getcwd()
print(cwd)
For example:
os.chdir("/path/to/new/directory")
For example:
import os
files = os.listdir("/path/to/directory")
print(files)
For example:
import os
os.mkdir("new_directory")
For example:
import os
os.rmdir("directory_to_remove")
The default metaclass is the type metaclass, which is responsible for creating and
defining the behavior of all classes. However, you can create your own metaclass by
subclassing type or using the __metaclass__ attribute in a class definition.
Metaclass provide a way to modify the behavior of class creation and allow you to
add or modify attributes, methods, or behavior for classes that are created using the
metaclass.
For example:
class MyMeta(type):
class MyClass(metaclass=MyMeta):
files = os.listdir("/path/to/directory")
For example:
For example:
For example:
Here's an example:
In the above code, 'filename.txt' represents the name or path of the file you want to
count the capital letters in.
The open() function is used to open the file, and the file is iterated line by line using
the first for loop (for line in open(' filename.txt ')) .
Then, for each line, the characters are iterated using the second for loop (for char in
line) .
The char. isupper() condition checks if the character is uppercase. The generator
expression 1 for line in open('filename.txt') for char in line if char. isupper()
generates 1 for each uppercase character.
Finally, the sum() function is used to add up all the 1 occurrences, resulting in the
count of capital letters, which is stored in the count variable.
Due to the vast range of capabilities, NumPy has become very popular and is the
most preferred package. The following image represents the uses of NumPy.
NumPy arrays are treated as objects which results in minimal memory usage.
Since Python keeps track of objects by creating or deleting them based on the
NumPy also provides various range of functions for BitWise Operations, String
Operations, Linear Algebraic operations, Arithmetic operations etc. These are
not provided on Python’s default lists.
When the size of ndarrays is changed, it results in a new array and the original
array is deleted.
import numpy as np
import numpy as np
my_array = np.arange(1, 6)
import numpy as np
my_array = np.linspace(1, 5, 5)
Using indexing and slicing: For a 1D array, you can use the [::-1] slicing to
reverse the array:
import numpy np
Using the np.flip() function: The np.flip() function can be used to reverse an
array along a specified axis. By default, it reverses the array along all axes.
Here's an example:
import numpy np
It is particularly useful when dealing with discrete data or integer-valued data. The
function operates on 1D arrays and returns a new array with the count of
occurrences for each integer value.
Example:
import numpy as np
print(bin_counts)
Output:
[0 1 4 1 1 1 ]
3. Panel - The Pandas have a third type of data structure known as Panel, which is
a 3D data structure capable of storing heterogeneous data but it isn’t that widely
Fast and efficient DataFrame object with default and customized indexing.
Tools for loading data into in-memory data objects from different file formats.
It can hold any data type such as integers, floats, and strings and its values are
mutable i.e. it can be changed but the size of the series is immutable i.e. it cannot be
changed.
By using a ‘ series ’ method, we can easily convert the list, tuple, and dictionary into a
series. A Series cannot contain multiple columns.
import pandas as pd
dataframe = pd.DataFrame( data, index, columns, dtype)
data - It represents various forms like series , map , ndarray , lists , dict , etc.
From a Python list: You can create a Series by passing a Python list to the
pd.Series() constructor:
From a NumPy array: You can create a Series from a NumPy array by passing
the array to the pd.Series() constructor:
From a dictionary: You can create a Series from a dictionary, where the keys of
the dictionary will be the index labels of the Series and the values will be the
data:
When the value of deep= True , the creation of a new object with a copy of the calling
object’s data and indices takes place.
Modifications to the data or indices of the copy will not be reflected in the original
object whereas when the value of deep= False , the creation of a new object will take
place without copying the calling object’s data or index i.e. only the references to the
data and index will be copied.
Any changes made to the data of the original object will be reflected in the shallow
copy and vice versa.
Examples are gender, social class, blood type, country affiliation, observation
time, etc. There is no hard and fast rule for how many values a categorical value
should have. One should apply one’s domain knowledge to make that determination
on the data sets
Using read_csv(): CSV is a comma-separated file i.e. any text file that uses
commas as a delimiter to separate the record values for each field. Therefore, in
order to load data from a text file we use pandas.read_csv() method.
Using read_table(): This function is very much like the read_csv() function, the
major difference being that in read_table the delimiter value is ‘\t’ and not a
Using read_fwf(): It stands for fixed-width lines. This function is used to load
DataFrames from files. Another very interesting feature is that it supports
optionally iterating or breaking the file into chunks. Since the columns in the text
file were separated with a fixed width, this read_fwf() read the contents
effectively into separate columns.
iloc(): It allows you to access data by specifying the integer-based positions of rows
and columns. The indexing starts from 0 for both rows and columns. You can use
integer-based slicing and indexing ranges to select specific rows or columns. The
iloc() function does not include the end value when slicing with ranges. Here's an
example to illustrate the usage of iloc():
loc(): The loc() function is primarily used for label-based indexing. It allows you to
access data by specifying labels or Boolean conditions for rows and column names.
You can use label-based slicing and indexing ranges to select specific rows or
columns.
The loc() function includes the end value when slicing with ranges. Here's an
example to illustrate the usage of loc():
The pd.cut() function allows you to divide a continuous variable into bins and assign
discrete labels to the values based on their bin membership.
Example: How you can use pd.cut() to convert continuous values into discrete
categories:
fillna(): The fillna() function in Pandas is used to fill missing values with a specified
scalar value or with values from another DataFrame or Series. The function replaces
the missing values with the provided scalar value or with values from a specified
Series or DataFrame.
Here's an example of using fillna() to fill missing values with a constant value:
Output;
Output:
Using loc[] indexing: Another approach is to use the loc[] indexing method to
directly assign values to a new row.
In the above program, we create a sample pandas Series named series containing
some numbers. We then use the % operator to check for multiples of 5 by applying
the condition series % 5 == 0 .
This condition returns a Boolean Series with True values where the numbers are
multiples of 5 and False values otherwise. Next, we use numpy.where() along with
Boolean indexing ([0]) to retrieve the positions of True values in the Boolean
Series.
In the above program, we create a sample pandas Series named series with some
values. We use the value_counts() function to count the occurrences of each value in
the Series and then retrieve the most frequent value using idxmax().
The idxmax() function returns the index label of the maximum value, which
corresponds to the most frequent value in this case. Next, we use Boolean indexing
(series != most_frequent) to create a mask of values that are not equal to the most
frequent value.
We use this mask to select those values from the Series and replace them with
"replaced" using the replace() function.
Output:
Output:
Output:
Output:
Output:
Features of Matplotlib:
1. Plotting Functions: Matplotlib provides a comprehensive set of plotting
functions that allow you to create various types of plots, including line plots,
scatter plots, bar plots, histogram plots, pie charts, and more.
For example, if you were looking to show the relationship between a person’s age
and their weight, a scatter graph would be more appropriate than a line chart. A line
chart would be more appropriate than a scatter graph when you are looking to show
a trend over time.
For example, if you were looking at the monthly sales of a company over the course
of a year, a line chart would be more appropriate than a scatter graph.
1. Titles and Labels: Set the title of the plot using plt.title() or ax.set_title(). Set
labels for the x-axis and y-axis using plt. xlabel() and plt. ylabel() or
ax. set_xlabel() and ax. set_ylabel() .
2. Legends: Add a legend to your plot using plt. legend() or ax. legend() .
Customize the legend location, labels, and other properties.
3. Grid Lines: Display grid lines on the plot using plt.grid(True) or ax.grid(True).
Customize the grid appearance with options like linestyle , linewidth , and
color .
4. Colors, Line Styles, and Markers: Control the colors of lines, markers, and
other plot elements using the color parameter in plotting functions. Customize
line styles (e.g., solid, dashed, dotted) using the linestyle parameter. Specify
markers (e.g., dots, triangles, squares) using the marker parameter.
5. Axis Limits and Ticks: Set custom axis limits using plt. xlim() and
plt. ylim() or ax. set_xlim() and ax. set_ylim() .
Customize the appearance of ticks on the x-axis and y-axis using plt. xticks()
6. Background and Plot Styles: Change the background color of the plot using
plt. figure(facecolor='color') or ax. set_facecolor('color') .
Output:
We plot the data on each subplot using the respective axes objects. Customizations
such as titles, labels, and legends are set using methods like set_title() ,
set_xlabel() , set_ylabel() , and legend() .
We use plt. tight_layout() to adjust the spacing between the subplots for better
visualization.
Finally, we use plt. show() to display the figure with the subplots.
In this example, We import the necessary modules, including seaborn as sns and
numpy as np. We create a 2D array of random values using np.random.rand() .
This will serve as our data for the heatmap. We use the sns.heatmap() function to
create the heatmap. The data array is passed as the first argument.
Output:
This dataset contains information about restaurant tips. We use the sns. catplot()
The x parameter specifies the variable to be plotted on the x-axis ('day' in this
example), the y parameter specifies the variable to be plotted on the y-axis
('total_bill' in this example), the data parameter specifies the dataset to use (the
'tips' dataset in this example), and the kind parameter specifies the type of
categorical plot to create ('box' plot in this example).
Output:
In this example, We import the necessary modules, including seaborn as sns and
numpy as np . We generate a random dataset using np.random. randn() .
Finally, we use plt. show() to display the distribution plot. When you run this code, it
will create a distribution plot based on the provided random dataset.
The plot will show the estimated probability density function (PDF) using a kernel
density estimation (KDE) line, as well as a histogram representation of the data.
On the other hand, a box plot provides a summary of the distribution, including
measures like the median, quartiles, and potential outliers, but it doesn't show the
detailed distribution of the data.
It allows us to explore the pairwise relationships between variables and gain insights
into their individual distributions. A joint plot, on the other hand, focuses on
visualizing the joint distribution and relationship between two variables in a single
plot.
Output:
The autopct='%1.1f%%' formats the percentage values displayed on each slice. The
startangle=90 rotates the chart to start from the 90-degree angle (top). After creating
the chart and adding a title, we display the pie chart using plt. show()
The violin plot gets its name from its shape, which resembles that of a violin or a
mirrored density plot. The width of each violin represents the density or frequency of
data points at different values.
The plot is mirrored at the center, indicating the symmetry of the distribution. The
thick black line in the middle represents the median.
Output:
plt. show() .
1. Scatter plot: It displays the joint distribution of the two variables using a scatter
plot, where each data point is represented by a marker on a 2D plane.
2. Histograms: It shows the marginal distribution of each variable along the x and
y axes using histograms. These histograms represent the frequency or count of
the variable values.
The joint plot helps to visualize the relationship between two variables, identify
patterns, clusters, and potential outliers, and understand their individual distributions.
It also provides a visual representation of the correlation between the variables,
allowing for insights into their dependency. To create a joint plot in Seaborn, you can
use the sns. jointplot() function.
Joint plots are particularly useful when analyzing the relationship between two
continuous variables and gaining insights into their individual and joint distributions.
They provide a comprehensive view of the data in a single plot, facilitating
exploratory data analysis and hypothesis testing.
A pie chart, on the other hand, is useful for displaying the proportions of a single
categorical variable but can be less effective when comparing multiple categories.
We create a figure and axes object using plt. subplots() . Then, we use the
fill_between() function twice to fill the area between the curves formed by values1
and values2. To customize the appearance of the area plot, you can specify the
colors using the color parameter and adjust the transparency with the alpha
parameter.
The regplot function in Seaborn combines the scatter plot and the linear regression
line in a single plot. It helps you understand the correlation and strength of the linear
relationship between the variables, as well as identify any potential outliers or
deviations from the trend line. Here are some key features of a regplot:
2. Linear Regression Line: It fits a regression line to the scatter plot using the
least squares method. The regression line represents the best-fit line that
minimizes the squared differences between the predicted and actual values.
4. Residual Plot: Additionally, regplot can display a separate plot showing the
residuals, which are the differences between the observed and predicted values.
This plot helps identify any patterns or heteroscedasticity in the residuals.
Regplots are commonly used in exploratory data analysis and data visualization
tasks to understand the relationship between two continuous variables.
They are helpful for detecting trends, outliers, and deviations from the linear
relationship. Regplots are often used in various fields, including statistics, social
sciences, economics, and machine learning, whenever analyzing the relationship
between variables is of interest.