100 Python Interview Questions
100 Python Interview Questions
Interview Questions
and Answers
APOORVA IYER
PYTHON BASICS AND CORE CONCEPTS
1|Page
6. What is the purpose of the with statement in Python?
The with statement is used for resource management. It ensures that files or resources are properly
closed or released after use, even if an error occurs.
Example:
with open("file.txt", "r") as f:
data = f.read()
8. What is the difference between shallow copy and deep copy in Python?
Shallow Copy: Copies the outer object but not nested objects.
Deep Copy: Copies all objects recursively.
Use the copy module:
import copy
copy.copy(obj) # Shallow
copy.deepcopy(obj) # Deep
2|Page
x = "Hello"
print(str(x)) # Hello
print(repr(x)) # 'Hello'
12. How do you read a CSV file into a DataFrame using Pandas?
Use the read_csv() function from Pandas to load CSV data:
import pandas as pd
df = pd.read_csv('filename.csv')
19. How do you handle categorical variables in Python for data analysis?
Label Encoding: Use LabelEncoder to convert categories into integers
One-Hot Encoding: Use pd.get_dummies() to create binary columns
Frequency or Target Encoding: Replace with aggregated numerical values
4|Page
pd.concat(): Combines dataframes along rows or columns
pd.merge(): Joins two dataframes using keys like SQL joins
24. What is the difference between .values, .to_numpy(), .tolist(), and .array in Pandas?
.values: Older way to get NumPy array
.to_numpy(): Preferred way to convert to NumPy
.tolist(): Converts to Python list
.array: Returns internal ExtensionArray
5|Page
30. What is the difference between merge(), join(), and concat() in Pandas?
merge(): SQL-style joins on columns
join(): Join on index
concat(): Stack DataFrames vertically or horizontally
6|Page
When working with arrays of different dimensions, NumPy "broadcasts" the smaller array across
the larger one so that they have compatible shapes. This avoids the overhead of copying data and
makes computations faster and memory efficient.
Example:
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6]])
result = a + b
print(result)
Output:
[[11 22 33]
[14 25 36]]
In this example, array b is automatically broadcasted to match the shape of array a. The values
from b are applied row-wise to a without explicitly repeating b.
How broadcasting works:
1. Compare the shapes of both arrays from right to left.
2. Dimensions are compatible when:
o They are equal, or
o One of them is 1.
3. NumPy then "stretches" the smaller shape to match the larger one.
Benefits of broadcasting:
Avoids memory duplication.
Increases code readability and conciseness.
Boosts computation speed for large datasets.
Common use cases:
Adding a scalar or 1D array to a matrix.
Normalizing rows or columns in datasets.
Element-wise operations on mismatched array shapes.
7|Page
Understanding broadcasting is critical in numerical computing because many high-performance
computations rely on this concept to work efficiently.
33. How do you calculate the correlation between two variables in pandas?
In data analysis, correlation measures the degree to which two variables move in relation to each
other. The most commonly used type is Pearson correlation, which captures linear relationships.
Pandas provides a built-in method called .corr() to calculate correlation coefficients between
columns of a DataFrame or between two Series.
Example:
import pandas as pd
df = pd.DataFrame({
'height': [160, 165, 170, 175, 180],
'weight': [55, 60, 65, 70, 75]
})
correlation = df['height'].corr(df['weight'])
print(correlation)
Output:
1.0
This output indicates a perfect positive linear correlation.
Types of correlation supported by pandas:
method='pearson' (default): measures linear correlation.
method='spearman': measures rank-based correlation (monotonic).
method='kendall': measures ordinal association.
Why this is useful:
Identifying strong or weak relationships between variables.
Detecting multicollinearity in regression analysis.
Selecting features based on correlation with the target variable.
Important Note: Correlation does not imply causation. Two variables can be correlated due to
confounding factors, so additional analysis is often needed.
8|Page
34. What is the difference between np.array() and np.asarray()?
Both np.array() and np.asarray() are used to convert Python sequences (like lists or tuples) into
NumPy arrays, but they behave differently in how they handle existing arrays.
Feature np.array() np.asarray()
Always creates No, returns input if it's already an
Yes, creates a new array by default
copy? array
Less efficient if input is already More efficient for avoiding
Efficiency
array duplication
When avoiding redundancy is
Use case When a fresh copy is needed
important
Example:
import numpy as np
x = np.array([1, 2, 3])
y = np.asarray(x)
9|Page
dimensionality and works well for very large cardinalities. Used in libraries like
CategoryEncoders.
5. Embeddings (for Deep Learning)
Convert categories into dense vectors using neural networks. Useful for large-scale
recommendation systems or NLP.
6. Clustering-Based Grouping
Group rare categories into an 'Other' category or cluster similar categories using business
logic or statistical metrics.
When to choose what?
Use frequency/target encoding for tree-based models like XGBoost.
Use one-hot encoding for linear models if cardinality is manageable.
Use embeddings for deep learning or NLP problems.
📌 Pro Tip: Always validate encodings with cross-validation to ensure they improve model
performance.
📌 Multicollinearity doesn’t affect the predictive power of the model drastically but makes
interpretation unreliable.
38. What is the difference between any() and all() functions in Python?
Both any() and all() are built-in functions that work with iterables (lists, tuples, sets, etc.), typically
used in conditions and filters.
Function Description Example Result
any() Returns True if at least one element is True any([False, True, False]) True
all() Returns True if all elements are True all([True, True, True]) True
Use cases:
11 | P a g e
any() is often used when checking if a list contains at least one True, non-zero, or valid
value.
all() is used for validating that all inputs or conditions meet a requirement.
Example in data filtering:
df = pd.DataFrame({'A': [1, 0, 3], 'B': [4, 5, 0]})
df['has_zero'] = df[['A', 'B']].apply(lambda x: not all(x), axis=1)
This adds a new column that flags rows with zero values.
📌 Left joins are commonly used in real-world datasets where one table has master records and
another has related transactional or lookup information.
12 | P a g e
Example:
names = ['Alice', 'Bob']
scores = [85, 92]
combined = list(zip(names, scores))
print(combined)
Output:
[('Alice', 85), ('Bob', 92)]
Use cases:
Creating dictionaries: dict(zip(keys, values))
Iterating over multiple lists simultaneously.
Unzipping data using zip(*zipped_data)
Unzipping Example:
zipped = [('a', 1), ('b', 2)]
letters, numbers = zip(*zipped)
Output:
letters = ('a', 'b')
numbers = (1, 2)
📌 zip() is memory efficient as it returns an iterator, and it’s widely used in data transformation,
pairing, and looping patterns.
41. How do you calculate the correlation between two variables in pandas?
Correlation measures the linear relationship between two variables. In pandas, you can calculate
it using the .corr() method, which by default uses the Pearson correlation coefficient.
Syntax:
df['column1'].corr(df['column2'])
Example:
import pandas as pd
df = pd.DataFrame({
'height': [150, 160, 170, 180, 190],
'weight': [50, 55, 65, 70, 80]
})
13 | P a g e
correlation = df['height'].corr(df['weight'])
print(correlation) # Output: 0.991 (strong positive correlation)
Types of correlation:
Pearson: Measures linear correlation (default).
Kendall and Spearman: Use .corr(method='kendall') or .corr(method='spearman') for non-
parametric data.
When to use:
Use Pearson when data is normally distributed and linear.
Use Spearman/Kendall for ranked or non-linear data.
📌 A value near 1 indicates strong positive correlation; near -1 indicates strong negative
correlation; near 0 indicates no correlation.
a = np.array([1, 2, 3])
b = np.asarray(a)
14 | P a g e
43. How do you read a JSON file into a pandas DataFrame?
JSON (JavaScript Object Notation) is a widely used format for data exchange. You can read it into
pandas using pd.read_json().
Syntax:
df = pd.read_json('data.json')
Example JSON file:
[
{"name": "Alice", "age": 25},
{"name": "Bob", "age": 30}
]
Read into pandas:
import pandas as pd
df = pd.read_json('sample.json')
Output:
name age
0 Alice 25
1 Bob 30
Advanced Options:
If your JSON file is nested, you can normalize it using json_normalize() from
pandas.json_normalize().
If reading from an API, use pd.read_json(url).
📌 Make sure the JSON file is properly formatted—pandas expects an array of records or a dict of
dicts.
15 | P a g e
})
avg_salary = df.groupby('department')['salary'].mean()
Output:
department
HR 35000.0
IT 55000.0
Functions used with groupby:
.sum(), .mean(), .count(), .max(), .min()
.agg({'col1': 'sum', 'col2': 'mean'}) for multiple aggregations
.transform() to apply a function and keep the original DataFrame shape
Use cases:
Calculating KPIs per region or category
Customer segmentation
Time series analysis by day/month/year
📌 Groupby follows a split-apply-combine pattern: it splits the data into groups, applies a
function, and combines the results.
16 | P a g e
df_clean = df.drop_duplicates()
Output:
id name
0 1 Alice
1 2 Bob
3 3 Charlie
📌 Always check for and handle duplicates before performing aggregations or model training.
📌 Proper handling of missing data is a critical part of data cleaning and preprocessing.
17 | P a g e
df = pd.DataFrame({
'A': [1, 2],
'B': [3, 4]
})
📌 .apply() is flexible but slower for large datasets; vectorized operations are faster when
available.
@decorator
def greet():
print("Hello")
greet()
Output:
18 | P a g e
Before function call
Hello
After function call
Use cases:
Logging
Timing
Caching
Access control and authentication
Built-in decorators:
@staticmethod
@classmethod
@property
@lru_cache (for memoization)
📌 Decorators follow the DRY (Don't Repeat Yourself) principle and are heavily used in Flask,
Django, and FastAPI.
19 | P a g e
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
class Child(Parent):
def greet(self):
super().greet() # Calls the parent method
print("Hello from Child")
child = Child()
child.greet()
# Output:
# Hello from Parent
# Hello from Child
20 | P a g e
51. What is list comprehension and how is it used?
List comprehension is a compact and elegant way to create lists in Python. It allows for writing
iterative logic in a single line by combining for loops and optional if conditions.
Syntax:
[expression for item in iterable if condition]
Why interviewers ask:
It tests your ability to write concise, Pythonic code and your grasp on control structures and data
handling.
Example:
squares = [x**2 for x in range(5)] # [0, 1, 4, 9, 16]
# With condition
evens = [x for x in range(10) if x % 2 == 0] # [0, 2, 4, 6, 8]
with MyContext():
print("Inside context")
21 | P a g e
53. How do you use regular expressions in Python?
Regular expressions (regex) are used for pattern matching and text parsing. Python provides the
re module to work with regex.
Key methods:
re.search(): Searches for the pattern anywhere in the string.
re.match(): Checks for a match at the beginning.
re.findall(): Returns all matching substrings.
re.sub(): Replaces substrings matching the pattern.
Example:
import re
22 | P a g e
Interview advantage:
Demonstrates your understanding of memory management, lazy evaluation, and performance-
aware design.
56. What are Python data classes and when should you use them?
Data classes, introduced in Python 3.7 via the dataclasses module, provide a simple way to define
classes that are mainly used to store data.
With the @dataclass decorator, Python automatically generates special methods like __init__(),
__repr__(), __eq__() based on the class attributes. This saves time and reduces boilerplate code.
Use Cases:
When you need lightweight classes to hold attributes (e.g., records, configuration objects)
When you want default values, type hints, and immutability features
Example:
from dataclasses import dataclass
@dataclass
class Employee:
name: str
age: int
department: str = "Data"
23 | P a g e
emp = Employee("Apoorva", 29)
print(emp)
# Output: Employee(name='Apoorva', age=29, department='Data')
Interview Tip:
Be ready to explain the advantages of data classes over regular classes and how they improve
readability and productivity in data-heavy applications.
def __str__(self):
return f'Book: {self.title}'
24 | P a g e
58. How do you flatten a nested list in Python?
Flattening a nested list involves converting a list of lists (or arbitrarily nested lists) into a single
list.
Using recursion:
def flatten(lst):
result = []
for item in lst:
if isinstance(item, list):
result.extend(flatten(item))
else:
result.append(item)
return result
@lru_cache(maxsize=128)
def fib(n):
if n < 2:
25 | P a g e
return n
return fib(n-1) + fib(n-2)
60. What are the most important data types in Python for analysts?
Python offers several built-in and library-supported data types that are crucial for data analysts:
int, float: For numerical calculations
str: For handling text data
list: Mutable ordered sequences
tuple: Immutable ordered sequences
set: Unordered collection of unique elements
dict: Key-value mappings
bool: True/False logic
DataFrame (from pandas): 2D tabular data structure, most commonly used in data analysis
Series (from pandas): 1D labeled array
Why this matters in interviews:
Demonstrating fluency with data types shows you're prepared to clean, transform, and model real-
world data using Python.
df = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Age': [25, 30]
})
print(df.to_dict(orient='records'))
# Output: [{'Name': 'Alice', 'Age': 25}, {'Name': 'Bob', 'Age': 30}]
Why this matters:
This operation is often used when integrating pandas with web applications, databases, or APIs,
where JSON or dictionary structures are required.
df = pd.DataFrame({
'Sales': [100, 150, 200, 250, 300]
})
print(df.describe())
Output:
27 | P a g e
Sales
count 5.000000
mean 200.000000
std 79.056942
min 100.000000
25% 150.000000
50% 200.000000
75% 250.000000
max 300.000000
Other useful functions:
df.mean(), df.median(), df.std(), df.min(), df.max(), df.mode()
Interview Tip:
Mention that you also use .value_counts() for categorical features and .corr() to examine
relationships between variables.
28 | P a g e
Interview Insight:
Sorting is often used during reporting, ranking, or preparing data for visual dashboards. Mention
how you combine sorting with filtering and aggregation in real-world scenarios.
29 | P a g e
66. How do you normalize data in Python?
Data normalization is a preprocessing technique used to rescale numeric values into a common
range. It is especially important in machine learning when features have different scales.
Common normalization methods:
Min-Max Scaling: Scales values to a fixed range [0, 1].
Standardization (Z-score): Centers the values around the mean with a standard deviation
of 1.
Robust Scaling: Uses median and interquartile range, good for data with outliers.
Using scikit-learn:
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
import pandas as pd
# Min-Max
minmax = MinMaxScaler()
print(minmax.fit_transform(data))
# Standardization
standard = StandardScaler()
print(standard.fit_transform(data))
When to use what:
Use MinMaxScaler when the data is bounded.
Use StandardScaler when your model assumes data is normally distributed (e.g., logistic
regression).
Use RobustScaler if your data contains outliers.
30 | P a g e
Example:
class Calculator:
def add(self, a, b=0, c=0):
return a + b + c
Method Overriding:
Subclass redefines a method from its superclass.
Example:
class Animal:
def speak(self):
print("Animal speaks")
class Dog(Animal):
def speak(self):
print("Dog barks")
d = Dog()
d.speak() # Output: Dog barks
Interview Tip:
Clarify that overloading can be mimicked using flexible parameters, but overriding is fully
supported in Python and used for polymorphism.
31 | P a g e
TypeError
FileNotFoundError
KeyError
Best practices:
Always catch specific exceptions.
Use finally to release resources.
Avoid catching all exceptions unless necessary.
asyncio.run(greet())
When to use:
Use coroutines when handling I/O-bound tasks like web scraping, APIs, or file operations to
improve performance without multi-threading.
32 | P a g e
Reduces memory usage.
Makes attribute access faster.
Prevents accidental creation of new attributes.
Limitation:
You can’t add attributes not listed in __slots__. Also, it doesn’t work well with inheritance unless
carefully managed.
71. How does Python handle memory management and garbage collection?
Python uses automatic memory management and garbage collection to efficiently allocate and
reclaim memory during program execution. This process ensures that memory no longer in use is
freed up, preventing memory leaks and optimizing performance.
Key components of memory management:
Reference Counting:
Every object in Python has an internal counter that tracks how many references point to it.
When this count reaches zero, the memory is immediately reclaimed.
Example:
a = [1, 2, 3] # reference count = 1
b=a # reference count = 2
del a # reference count = 1
del b # reference count = 0 → memory freed
Garbage Collector (GC):
Python’s gc module handles cyclic references, which reference counting alone cannot
resolve.
Cyclic garbage collection occurs periodically and identifies unreachable objects with
reference cycles.
import gc
gc.collect() # Manually triggers garbage collection
Memory Pools (pymalloc):
CPython uses an internal allocator for small memory blocks to speed up performance.
Best Practices for Interview:
Use del to explicitly remove large objects when no longer needed.
Avoid circular references when possible.
Use weak references (weakref) for caching or observer patterns where you don’t want to
increase the reference count.
33 | P a g e
📌 Summary: Python uses a combination of reference counting and cyclic garbage collection,
backed by memory pooling, to handle memory efficiently.
72. What is the Global Interpreter Lock (GIL), and how does it affect multithreading in
Python?
The Global Interpreter Lock (GIL) is a mutex (mutual exclusion lock) in CPython, the standard
implementation of Python. It ensures that only one thread executes Python bytecode at a time,
even on multi-core processors.
Why GIL exists:
Simplifies memory management by preventing race conditions in object access.
Ensures thread safety for core Python internals.
Implications:
For I/O-bound tasks (file operations, API requests), multithreading works fine because
threads release the GIL during I/O.
For CPU-bound tasks, GIL becomes a bottleneck as threads cannot run truly in parallel.
Alternatives to overcome GIL:
Use the multiprocessing module instead of threading for CPU-bound tasks.
from multiprocessing import Pool
Use libraries like NumPy or C-extensions that internally release the GIL during
computation.
Use asyncio for I/O-bound concurrency.
📌 Interview Tip: Explain that GIL limits multi-threaded parallelism for CPU-heavy tasks, and
Python offers workarounds like multiprocessing for true parallelism.
34 | P a g e
Package:
A package is a directory containing multiple modules and an __init__.py file (can be
empty or contain initialization code). It allows for a hierarchical structure.
analytics/
├── __init__.py
├── preprocess.py
└── models.py
Importing:
from analytics.models import train_model
📌 Key Difference: A module is a single file, while a package is a folder of modules with
optional sub-packages.
📌 Interview Tip: Demonstrate your awareness of software design and how circular imports
often indicate coupling that needs refactoring.
class MyClass(metaclass=Meta):
pass
When MyClass is defined, the Meta.__new__ method runs first.
Best Practice:
Use metaclasses only when absolutely necessary, as they add complexity. More common
alternatives include class decorators and factory functions.
📌 Interview Edge: Knowing metaclasses shows advanced Python expertise, but also mention
when not to use them.
36 | P a g e
Can lead to unpredictable behavior
Makes debugging difficult
Can break code silently if the patch fails
Safer alternatives:
Use inheritance or composition
Use mocking frameworks like unittest.mock for temporary patches
📌 Interview Tip: While monkey patching shows flexibility, always mention that it should be
avoided in production unless absolutely necessary.
📌 Summary: Python’s memory model is abstracted from the developer, but understanding it
helps write memory-efficient and high-performance code.
class MyClass:
pass
obj = MyClass()
weak_obj = weakref.ref(obj)
print(weak_obj()) # <__main__.MyClass object at ...>
del obj
print(weak_obj()) # None (object has been garbage collected)
When to use:
Large graph-like structures (e.g., trees)
Applications with caching requirements
Avoiding reference cycles in design patterns
📌 Interview Edge: Understanding weak references shows you can manage memory in large or
long-running systems efficiently.
38 | P a g e
6. if n < 2:
7. return n
8. return fib(n-1) + fib(n-2)
9. Convert to Iterative Approach:
Tail recursion is not optimized in Python, so rewriting as a loop avoids deep stacks.
10. Manual Memoization:
Use a dictionary to cache results.
11. Adjust Recursion Limit:
You can increase the recursion limit, but it's risky:
12. import sys
13. sys.setrecursionlimit(2000)
📌 Best Practice: Use memoization when recursion is necessary and prefer iteration if
performance is critical.
39 | P a g e
Monitor Memory:
Use gc, tracemalloc, and memory_profiler to detect memory leaks.
📌 Interview Tip: Memory management is not just about saving space—it directly affects
performance and scalability.
def pop(self):
if not self.is_empty():
return self.items.pop()
return None
def peek(self):
if not self.is_empty():
return self.items[-1]
return None
def is_empty(self):
return len(self.items) == 0
40 | P a g e
This implementation allows you to:
push() to add an item
pop() to remove and return the top item
peek() to view the top item without removing it
is_empty() to check if the stack is empty
This structure is commonly asked about in interviews, especially when evaluating your
understanding of basic data structures.
def dequeue(self):
if not self.out_stack:
while self.in_stack:
self.out_stack.append(self.in_stack.pop())
if self.out_stack:
return self.out_stack.pop()
return None
This implementation ensures that each element is transferred at most twice, resulting in amortized
O(1) time for each operation. It's an elegant approach to build a queue when only stack operations
are available.
41 | P a g e
83. How do you reverse a list without using slicing or built-in functions?
Reversing a list manually is a good way to test your understanding of loops and index
manipulation. You can iterate from the end of the list to the start and construct a new list:
Example:
def reverse_list(lst):
reversed_lst = []
for i in range(len(lst) - 1, -1, -1):
reversed_lst.append(lst[i])
return reversed_lst
This method avoids Python's slicing ([::-1]) or the reversed() function and is often used in
interviews to see if candidates understand control flow and memory allocation.
42 | P a g e
print("Locals:", locals())
print("Globals:", globals().keys())
test()
These functions are very useful in debugging, dynamic code execution (like with eval()), and in
writing meta-programming logic.
# Unzipping
43 | P a g e
a_unzip, b_unzip = zip(*zipped)
print(list(a_unzip)) # [1, 2, 3]
print(list(b_unzip)) # ['x', 'y', 'z']
zip() is widely used in iteration, dictionary construction, and parallel processing of lists.
88. What is the difference between break and continue in Python loops?
break is used to exit a loop prematurely when a condition is met.
continue skips the current iteration and moves to the next one.
Example:
for i in range(5):
if i == 3:
break
print(i) # Output: 0, 1, 2
for i in range(5):
if i == 3:
continue
print(i) # Output: 0, 1, 2, 4
Understanding these control flow keywords is crucial in handling loop-based logic efficiently.
44 | P a g e
@lru_cache(maxsize=None)
def add(a, b):
return a + b
lru_cache automatically manages the cache and is very helpful in recursive functions or frequently
repeated operations.
45 | P a g e
def decorator(func):
def wrapper():
print("Before function call")
func()
print("After function call")
return wrapper
@decorator
def greet():
print("Hello!")
greet()
Output:
Before function call
Hello!
After function call
In the above, @decorator is shorthand for greet = decorator(greet). The decorator function takes
another function as input and returns a new function that adds behavior before and after the
original function call.
📌 Decorators are powerful tools in Python for adding reusable logic around functions.
nums = [1, 2, 3]
46 | P a g e
print(list(filter(lambda x: x > 1, nums))) # [2, 3]
print(reduce(lambda x, y: x + y, nums)) #6
These tools are efficient when working with sequences of data and allow concise expressions of
common operations like transformation and reduction.
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
a = Singleton()
b = Singleton()
print(a is b) # Output: True
Here, __new__ is overridden to ensure that only one instance is created.
📌 Singleton patterns are commonly used in systems where one global shared resource is
required.
47 | P a g e
95. How do you check if two lists are identical in Python?
There are two common ways to compare lists in Python:
Using ==: Checks if the contents of both lists are the same and in the same order.
Using is: Checks if both lists are the same object in memory.
Example:
list1 = [1, 2, 3]
list2 = [1, 2, 3]
list3 = list1
📌 Always use the with statement for safer and cleaner file handling.
48 | P a g e
97. How do you find the most frequent element in a list using collections.Counter?
Python’s collections.Counter is an efficient way to count occurrences of items in an iterable and
retrieve the most common ones.
Example:
from collections import Counter
lst = [1, 2, 2, 3, 3, 3]
counter = Counter(lst)
most_common = counter.most_common(1)[0][0]
print(most_common) # Output: 3
Here, most_common(1) returns a list with the most frequent item and its count as a tuple.
📌 This is highly useful in NLP, frequency analysis, and classification problems.
def __str__(self):
return f"Book title: {self.title}"
49 | P a g e
RUNTIME MODIFICATION, MEMORY
MANAGEMENT, AND CODE OPTIMIZATION
# Original sqrt
print(math.sqrt(4)) # Output: 2.0
# Monkey patching
math.sqrt = lambda x: "Patched!"
print(math.sqrt(4)) # Output: Patched!
This can be helpful during:
Testing (to mock certain behaviors)
Adding quick fixes to third-party libraries
Dynamic behavior insertion for prototyping
However, monkey patching can make code difficult to debug and maintain. It should be used
cautiously as it can introduce hard-to-track bugs.
📌 Use monkey patching only when absolutely necessary and document it clearly.
100. How do you check if an element exists in a list without using the in keyword?
While in is the most Pythonic way to check for existence, an alternative method is using the any()
function with a generator expression.
Example:
def exists(lst, element):
return any(x == element for x in lst)
50 | P a g e
print(exists([1, 2, 3], 2)) # Output: True
This approach can also be used to apply custom logic during the check (like case-insensitive
comparison, partial match, etc.).
📌 This technique is often tested in interviews to assess familiarity with functional constructs and
logical reasoning.
101. How do you swap values of two variables in Python without using a temporary
variable?
Python allows variable swapping using tuple unpacking, which is concise and efficient.
Example:
a, b = 5, 10
a, b = b, a
print(a, b) # Output: 10 5
This works by packing the values into a temporary tuple and then unpacking them in reversed
order.
📌 Python’s ability to swap values this way is often highlighted for its elegance and is a common
interview question.
if __name__ == "__main__":
main()
Why it matters:
Prevents certain code from being executed during import
Useful in scripts that include tests or demo runs
Encourages modular, reusable code design
51 | P a g e
103. What is the purpose of the super() function in Python?
The super() function is used in object-oriented programming to call methods from a parent class. It
is commonly used in inheritance to avoid code duplication and to maintain a clean hierarchy.
Example:
class Parent:
def greet(self):
print("Hello from Parent")
class Child(Parent):
def greet(self):
super().greet()
print("Hello from Child")
Child().greet()
Output:
Hello from Parent
Hello from Child
Benefits:
Supports cooperative multiple inheritance
Makes code more maintainable and scalable
Ensures that the parent class initialization is executed
a = np.array([1, 2, 3])
b = np.asarray(a)
52 | P a g e
print(a is b) # Output: True
When to use which:
Use np.array() when you want to ensure independence from the original object.
Use np.asarray() for performance optimization when copying is unnecessary.
53 | P a g e
11. df_sample = pd.read_csv('large_file.csv', nrows=1000)
12. Use Libraries for Parallel Processing:
o Dask: Handles large datasets with parallelism.
o PySpark: Ideal for distributed computing.
📌 Always profile your memory usage and tweak these parameters accordingly for optimal
performance.
df = pd.DataFrame({
'Region': ['North', 'South', 'North', 'South'],
'Product': ['A', 'A', 'B', 'B'],
'Sales': [100, 150, 200, 250]
})
📌 Pivot tables are essential for data summarization and exploratory data analysis.
54 | P a g e
)
Key Benefits:
Retains original DataFrame shape
Useful in creating new derived features
Works great for feature engineering in machine learning pipelines
📌 Think of transform() as the go-to function when you need per-group calculations with full
DataFrame shape preservation.
108. What is the difference between merge(), join(), and concat() in pandas?
Method Purpose Key Features
Highly flexible, supports
merge() SQL-style joins using keys
left/right/outer/inner joins
Joins based on the index (or a key
join() Simpler syntax for index-based joins
column)
Concatenates DataFrames along a
concat() Used to stack vertically or horizontally
particular axis
Example of merge (on key column):
pd.merge(df1, df2, on='id', how='inner')
Example of join (on index):
df1.join(df2, how='left')
Example of concat:
pd.concat([df1, df2], axis=0) # Vertical stacking
pd.concat([df1, df2], axis=1) # Horizontal stacking
📌 Use merge() when dealing with key-based relational data. Use concat() when appending or
stacking similar datasets.
📌 It's particularly useful in data cleaning and wrangling pipelines where many transformations
are applied sequentially.
56 | P a g e
111. What is the difference between any() and all() in Python?
These are built-in Python functions used for evaluating Boolean conditions over iterable objects
like lists or tuples.
any(iterable): Returns True if at least one element in the iterable is truthy.
all(iterable): Returns True only if all elements in the iterable are truthy.
Examples:
any([False, True, False]) # Output: True
all([True, True, True]) # Output: True
all([True, False, True]) # Output: False
Use Cases in Data Analysis:
Checking for completeness of form inputs
Verifying that all records meet a condition
Filtering operations in custom functions
📌 These functions are concise and efficient alternatives to writing loops for condition checks.
📌 enumerate() increases clarity and avoids the need to manage counters manually during
iteration.
57 | P a g e
113. How do you create a virtual environment in Python?
A virtual environment is a self-contained directory where you can install project-specific
dependencies without affecting global installations.
Steps to Create:
1. Install virtualenv (if not already):
2. pip install virtualenv
3. Create a virtual environment:
4. virtualenv env_name
5. Activate the environment:
o On Windows:
o env_name\Scripts\activate
o On macOS/Linux:
o source env_name/bin/activate
6. Install packages locally:
7. pip install pandas numpy
8. Deactivate the environment:
9. deactivate
📌 Virtual environments are essential for avoiding dependency conflicts and ensuring
reproducibility.
58 | P a g e
8. IQR = Q3 - Q1
9. df = df[~((df['column'] < (Q1 - 1.5 * IQR)) | (df['column'] > (Q3 + 1.5 * IQR)))]
10. Boxplots and Visual Inspection:
11. import seaborn as sns
12. sns.boxplot(data=df, x='column')
Handling Methods:
Remove them
Cap or transform them
Use robust algorithms that are insensitive to outliers (e.g., tree-based models)
📌 Outlier detection should align with the business logic and not be purely statistical.
115. What are weak references in Python and where are they used?
A weak reference allows one object to refer to another object without increasing its reference
count. When the original object is no longer needed elsewhere, it can be garbage collected—even
if a weak reference to it still exists.
Use Case:
Helpful in memory-sensitive applications like caching, where you don’t want your cache to
prevent the original object from being collected.
Example:
import weakref
class MyClass:
pass
obj = MyClass()
weak_obj = weakref.ref(obj)
59 | P a g e
Managing object lifecycles in GUI or simulation applications
📌 Use weak references when you want to keep a lightweight reference to an object without
preventing it from being freed.
60 | P a g e
o sklearn.utils.class_weight
📌 Always validate with stratified sampling to ensure fair evaluation across classes.
117. What are hashable objects in Python and why do they matter?
A hashable object has a hash value that remains constant throughout its lifetime and supports
comparison via __eq__.
Why It Matters:
Only hashable objects can be used as dictionary keys or added to sets
Immutability is a key trait of hashable objects
Examples:
hash("hello") # Valid
hash((1, 2, 3)) # Valid
hash([1, 2, 3]) # Error: list is unhashable
Hashable Types:
int, str, float, bool
tuple (if all its elements are hashable)
Unhashable Types:
list, set, dict (mutable types)
📌 Understanding hashability helps avoid common runtime errors and design efficient data
structures.
118. What is the difference between a generator expression and a list comprehension?
Both are used to create sequences from iterables, but differ in how they evaluate and store data.
Feature Generator Expression List Comprehension
Syntax (x for x in iterable) [x for x in iterable]
Evaluation Lazy (on-the-fly) Eager (all at once)
Memory Usage Efficient High for large data
Output Type Generator object List
Example:
gen = (x**2 for x in range(1000000)) # Generator
lst = [x**2 for x in range(1000000)] # List
When to Use:
61 | P a g e
Use generators when dealing with large data and memory optimization is a priority
Use list comprehensions when you need a materialized sequence
📌 Prefer generators in loops, pipelines, or when the full result is not immediately needed.
📌 Profiling is critical for optimizing algorithms in data pipelines and reducing runtime on large
datasets.
m = Math()
print(m.add(5)) #5+0=5
print(m.add(5, 10)) # 5 + 10 = 15
Method Overriding
Occurs when a subclass provides its own version of a method from the parent class.
Allows for custom behavior in derived classes.
63 | P a g e
Example of Overriding:
class Parent:
def greet(self):
print("Hello from Parent")
class Child(Parent):
def greet(self):
print("Hello from Child")
c = Child()
c.greet() # Output: Hello from Child
📌 Overriding is widely used in polymorphism and abstract class implementations, whereas
overloading is mimicked via flexible function definitions.
class Child(Parent):
def greet(self):
super().greet()
print("Hello from Child")
Child().greet()
Output:
Hello from Parent
64 | P a g e
Hello from Child
📌 super() is vital for extending or modifying base class methods in a clean and scalable manner.
def __str__(self):
return f"Book: {self.title}"
b = Book("Python Basics")
print(b) # Output: Book: Python Basics
📌 Magic methods enable operator overloading, custom iteration, and richer object behavior.
124. What are Python data classes and when should you use them?
Data classes are introduced in Python 3.7 via the dataclasses module. They reduce boilerplate
when defining classes meant to store data.
Features:
Auto-generates __init__, __repr__, __eq__, etc.
Supports default values and type hints
Example:
from dataclasses import dataclass
@dataclass
65 | P a g e
class Employee:
name: str
age: int
role: str = "Analyst"
Benefits:
Clean syntax
Easier comparison and representation
Good for simple data containers
📌 Use data classes for modeling data records or DTOs (Data Transfer Objects).
class MyClass(metaclass=Meta):
pass
Output:
Creating class MyClass
📌 Metaclasses are an advanced feature best used when building frameworks or highly dynamic
systems.
66 | P a g e
Fix bugs in third-party libraries without altering the source
Mock external dependencies during testing
Example:
import math
math.sqrt = lambda x: "Patched!"
print(math.sqrt(4)) # Output: Patched!
Risks:
Hard to debug
Makes code behavior unpredictable
📌 Use monkey patching sparingly and only in controlled scenarios like testing.
67 | P a g e