loc and iloc are two methods used in pandas, a popular Python library for data manipulation.
They are
used to select rows and columns from a DataFrame, but they differ in how they reference and access
data:
loc:
Stands for "location" and is primarily label-based.
It is used for selecting data by specifying row and column labels or boolean conditions.
The syntax is typically df.loc[row_label, column_label] or df.loc[boolean_condition].
iloc:
Stands for "integer location" and is primarily integer-based.
It is used for selecting data by specifying the integer positions of rows and columns.
The syntax is typically df.iloc[row_index, column_index].
Here's an example to illustrate the difference:
E.X.
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data, index=['x', 'y', 'z'])
# Using loc to select data by label
result_loc = df.loc['x', 'A'] # Selects the value at row 'x' and column 'A'
# Using iloc to select data by integer location
result_iloc = df.iloc[0, 0] # Selects the value at the first row and first column
print(result_loc) # Output: 1
print(result_iloc) # Output: 1
Bfill and ffill
bfill and ffill are methods in pandas used for filling missing values in a DataFrame or Series with
values from nearby rows. They are often used in data preprocessing when dealing with missing
data.
bfill stands for "backward fill." It fills missing values with the next valid value from the bottom
(i.e., the next row in the DataFrame). It looks backward to fill gaps.
ffill stands for "forward fill." It fills missing values with the last valid value from the top (i.e., the
previous row in the DataFrame). It looks forward to fill gaps.
Pandas functions
For Data Inspection
1. df.head(n): Display the first n rows of a DataFrame.
2. df.tail(n): Display the last n rows of a DataFrame.
3. df.shape: Get the number of rows and columns in the DataFrame.
4. df.info(): Display information about the DataFrame, including data types and missing
values.
5. df.describe(): Generate descriptive statistics for numeric columns.
Selection and Filtering:
1. df[column_name]: Select a single column by name.
2. df[[col1, col2]]: Select multiple columns.
3. df.loc[rows, columns]: Select rows and columns by label.
4. df.iloc[rows, columns]: Select rows and columns by integer position.
5. df[df['column'] > value]: Filter rows based on a condition.
Data Manipulation:
1. df.drop(columns=['col1', 'col2']): Remove specified columns.
2. df.rename(columns={'old_name': 'new_name'}): Rename columns.
3. df.sort_values(by='column_name'): Sort the DataFrame by a column.
4. df.groupby('column_name').agg(func): Group data and apply an aggregation function.
5. df.pivot_table(): Create pivot tables.
Handling Missing Data:
1. df.isnull(): Check for missing values.
2. df.dropna(): Remove rows with missing values.
3. df.fillna(value): Fill missing values with a specific value.
Data Visualization:
1. df.plot(): Create basic plots using Matplotlib.
I/O Operations:
pd.read_csv('file.csv'): Read data from a CSV file.
df.to_csv('file.csv'): Write DataFrame to a CSV file.
Similar functions exist for other file formats like Excel, SQL databases, etc.
Statistical Functions:
df.mean(), df.median(), df.std(), etc.: Calculate basic statistics for columns.
Merging and Joining Data:
pd.concat([df1, df2]): Concatenate DataFrames.
pd.merge(df1, df2, on='key'): Perform SQL-like joins.