0% found this document useful (0 votes)
4 views15 pages

Chapter Python Pandas-II

Chapter 2 of the document covers various functions and methods in Python Pandas for manipulating DataFrames, including iterating over rows and columns, performing binary operations, and calculating descriptive statistics. It explains the differences between functions like sum() and add(), as well as the significance of handling missing data. Additionally, it provides examples of cumulative functions and the use of inspection functions like info() and describe().

Uploaded by

pritamyadavv268
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views15 pages

Chapter Python Pandas-II

Chapter 2 of the document covers various functions and methods in Python Pandas for manipulating DataFrames, including iterating over rows and columns, performing binary operations, and calculating descriptive statistics. It explains the differences between functions like sum() and add(), as well as the significance of handling missing data. Additionally, it provides examples of cumulative functions and the use of inspection functions like info() and describe().

Uploaded by

pritamyadavv268
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter – 2

PYTHON PANDAS - II
Type A: Very Short Answer Questions

Question 1

Name the function to iterate over a DataFrame horizontally.

Answer

The iterrows() function iterates over a DataFrame horizontally.

Question 2

Name the function to iterate over a DataFrame vertically.

Answer

The iteritems() function iterates over a DataFrame vertically.

Question 3

Write equivalent expressions for the given functions :

(i) A.add(B)

(ii) B.add(A)

(iii) A.sub(B)

(iv) B.sub(A)

(v) A.rsub(B)

(vi) B.mul(A)

(vii) A.rdiv(B)

(viii) B.div(A)

(ix) B.rdiv(A)

(x) A.div(B)

Answer

(i) A.add(B) — A + B
(ii) B.add(A) — B + A

(iii) A.sub(B) — A - B

(iv) B.sub(A) — B - A

(v) A.rsub(B) — B - A

(vi) B.mul(A) — B * A

(vii) A.rdiv(B) — B / A

(viii) B.div(A) — B / A

(ix) B.rdiv(A) — A / B

(x) A.div(B) — A / B

Question 4

Is the result of sub() and rsub() the same? Why/why not ?

Answer

The sub() and rsub() functions produce different results because they subtract the
operands in a different order. The sub()function performs element-wise subtraction between
two DataFrames, subtracting the right operand from the left operand. The syntax
is <DF1>.sub(<DF2>), which means <DF1> - <DF2>. On the other hand,
the rsub()function performs element-wise subtraction with the right operand subtracted
from the left operand. The syntax is <DF1>.rsub(<DF2>), which means <DF2> - <DF1>.

Question 5

Write appropriate functions to perform the following on a DataFrame ?

(i) Calculate the sum

(ii) Count the values

(iii) Calculate the average

(iv) Calculate the most repeated value

(v) Calculate the median

(vi) Calculate the standard deviation

(vii) Calculate the variance

(viii) Calculate the maximum value

(ix) Calculate the standard deviation


(x) Calculate the variance

Answer

(i) Calculate the sum — sum() function

(ii) Count the values — count() function

(iii) Calculate the average — mean() function

(iv) Calculate the most repeated value — mode() function

(v) Calculate the median — median() function

(vi) Calculate the standard deviation — std() function

(vii) Calculate the variance — var() function

(viii) Calculate the maximum value — max() function

(ix) Calculate the standard deviation — std() function

(x) Calculate the variance — var() function

Question 6

What does info() and describe() do ?

Answer

The info() method in pandas provides basic information about a DataFrame, including the
data types of each column, the number of rows, and the memory usage. It also displays the
index name and type. On the other hand, the describe() method provides detailed
descriptive statistics about the numerical columns in a DataFrame, including count, mean,
standard deviation, minimum value, 25th percentile, median (50th percentile), 75th
percentile, and maximum value.

Question 7

Are sum() and add() functions the same ?

Answer

No, the sum() and add() functions are not the same in pandas. The sum() function in
pandas calculates the sum of values along a specified axis. On the other hand,
the add()function in pandas is used to add two objects element-wise.

Question 8

Name some functions that perform descriptive statistics on a DataFrame.

Answer
Some functions that perform descriptive statistics on a DataFrame are min(), max(),
idxmax(), idxmin(), mode(), mean(), median(), count(), sum(), quantile(), std(), var().

Question 9

To consider only the numeric values for calculation, what argument do you pass to statistics
functions of Pandas ?

Answer

The numeric_only = True argument is passed to statistics functions in pandas to consider


only the numeric values (int, float, boolean columns) for calculation.

Question 10

Is there one function that calculates much of descriptive statistics values ? Name it.

Answer

Yes, there is one function that calculates many descriptive statistics values, namely the
describe() function.

Question 11

What happens if mode() returns multiple values for a column but other columns have a single
mode ?

Answer

When the mode() function in pandas returns multiple values for a column (meaning there are
multiple modes), but other columns have a single mode, pandas fills the non-mode columns
with NaN values.

Question 12

What is quantile and quartile ?

Answer

Quantile is a process of dividing the total distribution of given data into a given number of
equal proportions.

Quartile refers to the division of the total distribution of given data into a four equal
proportions with each containing one fourth of the total population.

Question 13

Name the function that lets you calculate different types of quantiles.

Answer
The function used to calculate different types of quantiles is quantiles() with syntax
: <dataframe>.quantile(q = 0.5, axis = 0, numeric_only = True) .

Question 14

Name the functions that give you maximum and minimum values in a DataFrame.

Answer

The max() and min() functions find out the maximum and minimum values respectively
from a DataFrame.

Question 15

Name the functions that give you the index of maximum and minimum values in a
DataFrame.

Answer

The idxmax() and idxmin() functions find out the index of maximum and minimum values
respectively from a DataFrame.

Question 16

What is missing data ?

Answer

Missing data, also known as missing values are the values that cannot contribute to any
computation or we can say that missing values are the values that carry no computational
significance.

Question 17

Why is missing data filled in DataFrame with some value ?

Answer

The dropna() method in pandas can be used to remove rows or columns with missing
values from a DataFrame. However, this can also lead to a loss of non-null data, which may
not be desirable for data analysis. To avoid losing non-null data, we can use
the fillna() method to fill missing values with a specified value. This can help to ensure
that the dataset is complete and can be used for data analysis.

Question 18

Name the functions you can use for filling missing data.

Answer
In Python, the fillna() function is used to fill missing data in a Pandas DataFrame.
Type B: Short Answer Questions/Conceptual Questions

Question 1

How do you iterate over a DataFrame?

Answer

To iterate over a DataFrame in Python, we can use various methods depending on our
specific needs. We can use the iterrows() method to iterate over each row in the
DataFrame. In this method, each horizontal subset is in the form of (row-index, series), where
the series contains all column values for that row-index. Additionally, we can use
the iteritems()method to iterate over each column in the DataFrame. Here, each vertical
subset is in the form of (column-index, series), where the series contains all row values for
that column-index.

Question 2

What are binary operations ? Name the functions that let you perform binary operations on a
DataFrame.

Answer

Binary operations refer to operations that require two values to perform and these values are
picked element-wise. In a binary operation involving DataFrames, the data from the two
DataFrames are aligned based on their row and column indexes. For matching row and
column indexes, the specified operation is performed, while for non-matching row and
column indexes, NaN values are stored in the result. The functions that perform binary
operations on a DataFrame include add(), radd(), sub(), rsub(), mul(), div(), and rdiv().

Question 3

What is descriptive statistics ? Name Pandas descriptive statistics functions.

Answer

A descriptive statistic is a summary statistic that quantitatively describes or summarizes


features of a collection of information.

The functions that perform descriptive statistics on a DataFrame are min(), max(), idxmax(),
idxmin(), mode(), mean(), median(), count(), sum(), quantile(), std(), var().

Question 4

The info() and describe() are said to be inspection functions. What do they do ?

Answer
The info() and describe() functions in pandas are considered inspection functions
because they provide valuable insights into the structure, content, and statistical summary of
a DataFrame. The info() method in pandas provides basic information about a DataFrame,
including the data types of each column, the number of rows, and the memory usage. It also
displays the index name and type. On the other hand, the describe() method provides
detailed descriptive statistics about the numerical columns in a DataFrame, including count,
mean, standard deviation, minimum value, 25th percentile, median (50th percentile), 75th
percentile, and maximum value.

Question 5

What is the difference between sum and cumulative sum? How do you perform the two on
DataFrame ?

Answer

The sum provides the total of all numbers in a sequence, while the cumulative sum represents
the running total of numbers encountered up to a specific point in the sequence.

The sum() function gives the total sum of values along specified axes, whereas
the cumsum() function provides the cumulative sum of values along specified axes, either
column-wise or row-wise.

Question 6

Name and explain some cumulative functions provided by Pandas.

Answer

The cumulative functions provided by Pandas are cumsum(), cumprod(), cummax(),


cummin().

1. cumsum() — It calculates cumulative sum i.e., in the output of this function, the
value of each row is replaced by sum of all prior rows including this row. The syntax
is <DF>.cumsum([axis = None]).

2. cumprod() — It calculates cumulative product of values in a DataFrame object.

3. cummax() — It calculates cumulative maximum of value from a DataFrame object.

4. cummin() — It calculates cumulative minimum of value from a DataFrame object.

Question 7

The head() and tail() extract rows or columns from a DataFrame. Explain.

Answer

The head() function in pandas retrieves the top n rows of a DataFrame, where n is an
optional argument defaulting to 5 if not provided. It is used with the syntax <DF>.head().
Similarly, the tail() function in pandas fetches the bottom n rows of a DataFrame, where n
is also optional and defaults to 5 if not specified. Its syntax is <DF>.tail().

Question 8

Why does Python change the datatype of a column as soon as it stores an empty value (NaN)
even though it has all other values stored as integer ?

Answer

When a column in a pandas DataFrame contains a mixture of integers and missing values
(NaNs), the data type is automatically changed to a floating-point type to accommodate these
missing values because in Python integer types cannot store NaN values.

Question 9

What do quantile and var() functions do ?

Answer

The quantile() function returns the values at the given quantiles over requested axis (axis
0 or 1). On the other hand, var() function computes variance and returns the unbiased
variance over the requested axis.

Question 10

What is a quartile ? How is it different from quantile ?

Answer

Quartile refers to the division of the total distribution of given data into a four equal
proportions with each containing one fourth of the total population.

Quartiles specifically divide the dataset into four equal parts, while quantiles can divide the
dataset into any desired number of equal parts.

Question 11

How do you create quantiles and quartiles in Python Pandas ?

Answer

In Python pandas, we can create quantiles and quartiles using the quantile() method.
The statement to create quartiles is df.quantile(q = 0.25).

Question 12

Assume that required libraries (Pandas and Numpy) are imported and DataFrame ndf has
been created as shown in solved problem 16. Predict the output produced by following code
fragment :
print(ndf[ndf["age"] > 30])
print(ndf.head(2))
print(ndf.tail(3))

The following DataFrame ndf is from solved problem 16 :


Name Sex Position City age Projects Budget
0 Rabina F Manager Bangalore 30 13 48
1 Evan M Programer New Delhi 27 17 13
2 Jia F Manager Chennai 32 16 32
3 Lalit M Manager Mumbai 40 20 21
4 Jaspreet M Programmer Chennai 28 21 17
5 Suji F Programmer Bangalore 32 14 10
Answer

Output

Name Sex Position City age Projects Budget


2 Jia F Manager Chennai 32 16 32
3 Lalit M Manager Mumbai 40 20 21
5 Suji F Programmer Bangalore 32 14 10
Name Sex Position City age Projects Budget
0 Rabina F Manager Bangalore 30 13 48
1 Evan M Programer New Delhi 27 17 13
Name Sex Position City age Projects Budget
3 Lalit M Manager Mumbai 40 20 21
4 Jaspreet M Programmer Chennai 28 21 17
5 Suji F Programmer Bangalore 32 14 10

Explanation

1. print(ndf[ndf["age"] > 30]) — The ndf["age"] > 30 creates a boolean


mask where True indicates rows where the "age" column has values greater than 30.
Then ndf[...] uses this boolean mask to filter the DataFrame ndf, returning only
rows where the condition is True.
2. print(ndf.head(2)) — It prints the first 2 rows of the DataFrame ndf.
3. print(ndf.tail(3)) — It prints the last 3 rows of the DataFrame ndf.

Question 13

Given the two DataFrames as :

>>> dfc1
- -
0 1
0 2 a
1 3 b
2 4 c
>>> dfc2
- -
0 1 2
0 2 3 4
2 p q r
Why are following statements giving errors ?

(a) print(dfc1 + dfc2)

(b) print(dfc1.sub(dfc2))

(c) print(dfc1 * dfc2)

Answer

(a) print(dfc1 + dfc2) — This statement tries to add two DataFrames dfc1 and dfc2.
However, they have different shapes (dfc1has 3 rows and 2 columns, while dfc2 has 2 rows
and 3 columns), and their indices do not match. DataFrame addition requires both
DataFrames to have the same shape and compatible indices, which is not the case here.
(b) print(dfc1.sub(dfc2)) — This statement attempts to subtract dfc1 from dfc2 using
the sub() method. Similar to addition, subtraction between DataFrames requires them to
have the same shape and compatible indices, which is not satisfied here due to the mismatch
in shapes and indices.
(c) print(dfc1 * dfc2) — This statement tries to perform element-wise multiplication
between dfc1 and dfc2. Again, this operation requires both DataFrames to have the same
shape and compatible indices, which is not the case here due to the mismatched shapes and
indices.

Question 14

Consider the following code that creates two DataFrames :

ore1 = pd.DataFrame(np.array([[20, 35, 25, 20], [11, 28, 32, 29]]),


columns = ['iron', 'magnesium', 'copper', 'silver'])
ore2 = pd.DataFrame(np.array([[14, 34, 26, 26], [33, 19, 25, 23]]),
columns = ['iron', 'magnesium', 'gold', 'silver'])

What will be the output produced by the following code fragments ?

(a) print(ore1 + ore2)


ore3 = ore1.radd(ore2)
print(ore3)

(b) print(ore1 - ore2)


ore3 = ore1.rsub(ore2)
print(ore3)

(c) print(ore1 * ore2)


ore3 = ore1.mul(ore2)
print(ore3)

(d) print(ore1 / ore2)


ore3 = ore1.rdiv(ore2)
print(ore3)
Answer

(a)

Output

copper gold iron magnesium silver


0 NaN NaN 34 69 46
1 NaN NaN 44 47 52
copper gold iron magnesium silver
0 NaN NaN 34 69 46
1 NaN NaN 44 47 52

Explanation

1. print(ore1 + ore2): This line attempts to add the


DataFrames ore1 and ore2using the '+' operator. When adding DataFrames with
different shapes and column names, pandas aligns the DataFrames based on indices
and columns, resulting in NaN values where elements are missing in either
DataFrame.
2. ore3 = ore1.radd(ore2) : This line uses the radd() method, which reverses the
addition operation between DataFrames ore1 and ore2.
3. The radd() function and '+' operator produce the same result in pandas, as the order
of operands does not affect addition due to its commutative property.

(b)

Output

copper gold iron magnesium silver


0 NaN NaN 6 1 -6
1 NaN NaN -22 9 6
copper gold iron magnesium silver
0 NaN NaN -6 -1 6
1 NaN NaN 22 -9 -6

Explanation

1. print(ore1 - ore2): This line performs a subtraction operation between


corresponding elements in DataFrames ore1 and ore2. When subtracting
DataFrames with different shapes and column names, pandas aligns the DataFrames
based on indices and columns, resulting in NaN values where elements are missing in
either DataFrame.
2. ore3 = ore1.rsub(ore2) : This line uses the rsub() method, which reverses the
subtraction operation between DataFrames ore1 and ore2.
3. The rsub() function and the '-' operator in pandas do not produce the same result
because the rsub() function performs reverse subtraction, which means it subtracts
the left operand from the right, while the '-' subtracts the right operand from the left
operand.
(c)

Output

copper gold iron magnesium silver


0 NaN NaN 280 1190 520
1 NaN NaN 363 532 667
copper gold iron magnesium silver
0 NaN NaN 280 1190 520
1 NaN NaN 363 532 667

Explanation

1. print(ore1 * ore2): This line attempts to perform element-wise multiplication


between the DataFrames ore1 and ore2using the '*' operator. When multiplying
DataFrames with different shapes and column names, pandas aligns the DataFrames
based on indices and columns, resulting in NaN values where elements are missing in
either DataFrame.
2. ore3 = ore1.mul(ore2): This line uses the mul() method to perform element-
wise multiplication between DataFrames ore1 and ore2.

(d)

Output

copper gold iron magnesium silver


0 NaN NaN 1.428571 1.029412 0.769231
1 NaN NaN 0.333333 1.473684 1.260870
copper gold iron magnesium silver
0 NaN NaN 0.7 0.971429 1.300000
1 NaN NaN 3.0 0.678571 0.793103

Explanation

1. print(ore1 / ore2): This line attempts to perform element-wise division between


the DataFrames ore1 and ore2 using the '/' operator. When dividing DataFrames
with different shapes and column names, pandas aligns the DataFrames based on
indices and columns, resulting in NaN values where elements are missing in either
DataFrame.
2. ore3 = ore1.rdiv(ore2) : This line uses the rdiv() method to perform reciprocal
division between DataFrames ore1 and ore2.

Question 15

Consider the DataFrame wdf as shown below :

minTemp maxTemp Rainfall Evaporation


0 2.9 8.0 24.3 0.0

1 3.1 14.0 26.9 3.6

2 6.2 13.7 23.4 3.6

3 5.3 13.3 15.5 39.8

4 6.3 17.6 16.1 2.8

5 5.4 18.2 16.9 0.0

6 5.5 21.1 18.2 0.2

7 4.8 18.3 17.0 0.0

8 3.6 20.8 19.5 0.0

9 7.7 19.4 22.8 16.2

10 9.9 24.1 25.2 0.0

11 11.8 28.5 27.3 0.2

12 13.2 29.1 27.9 0.0

13 16.8 24.1 30.9 0.0

14 19.4 28.1 31.2 0.0

15 21.6 34.4 32.1 0.0

16 20.4 33.8 31.2 0.0

17 18.5 26.7 30.0 1.2

18 18.8 32.4 32.3 0.6


19 17.6 28.6 33.4 0.0

20 19.7 30.3 33.4 0.0

(a) Write statement(s) to calculate minimum value for each of the columns.

(b) Write statement(s) to calculate maximum value for each of the rows.

(c) Write statement(s) to calculate variance for column Rainfall.

(d) Write statement(s) to compute mean , mode median for last 10 rows.

Answer

(a)

>>> wdf.min()

Output

Minimum values for each column:


minTemp 2.9
maxTemp 8.0
Rainfall 15.5
Evaporation 0.0
dtype: float64
(b)

>>> wdf.max(axis=1)

Output

Maximum values for each row:


0 24.3
1 26.9
2 23.4
3 39.8
4 17.6
5 18.2
6 21.1
7 18.3
8 20.8
9 22.8
10 25.2
11 28.5
12 29.1
13 30.9
14 31.2
15 34.4
16 33.8
17 30.0
18 32.4
19 33.4
20 33.4
dtype: float64
(c)

>>> wdf['Rainfall'].var()

Output

Variance for column Rainfall: 38.852999999999994


(d)

>>> last_10_rows = wdf.tail(10)


>>> last_10_rows.mean()
>>> last_10_rows.mode()
>>> last_10_rows.median()

Output

Mean for last 10 rows:


minTemp 17.78
maxTemp 29.60
Rainfall 30.97
Evaporation 0.20
dtype: float64

Mode for last 10 rows:


minTemp maxTemp Rainfall Evaporation
0 11.8 24.1 31.2 0.0
1 13.2 26.7 33.4 NaN
2 16.8 28.1 NaN NaN
3 17.6 28.5 NaN NaN
4 18.5 28.6 NaN NaN
5 18.8 29.1 NaN NaN
6 19.4 30.3 NaN NaN
7 19.7 32.4 NaN NaN
8 20.4 33.8 NaN NaN
9 21.6 34.4 NaN NaN

Median for last 10 rows:


minTemp 18.65
maxTemp 28.85
Rainfall 31.20
Evaporation 0.00
dtype: float64

You might also like