0% found this document useful (0 votes)
8 views

Pandas_Tutorial

Pandas is a Python library for data manipulation and analysis, utilizing Series and DataFrames for efficient structured data handling. It provides functionalities for reading various data formats, data cleaning, manipulation, grouping, and merging. This tutorial covers essential operations that form the foundation of data analysis workflows using Pandas.

Uploaded by

otj7w
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Pandas_Tutorial

Pandas is a Python library for data manipulation and analysis, utilizing Series and DataFrames for efficient structured data handling. It provides functionalities for reading various data formats, data cleaning, manipulation, grouping, and merging. This tutorial covers essential operations that form the foundation of data analysis workflows using Pandas.

Uploaded by

otj7w
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Pandas Tutorial

### Pandas Overview


Pandas is a Python library designed for data manipulation and
analysis. It provides powerful, flexible data structures-Series and
DataFrames-for working with structured data efficiently.

---

## 1. DataFrames and Series

### Series
A Series is a one-dimensional array-like object that can hold data of
any type (integers, strings, floats, etc.), along with an associated
index. It is similar to a column in a spreadsheet or a dictionary where
keys are the index.

Example:
```python
import pandas as pd

data = [10, 20, 30, 40]


index = ['A', 'B', 'C', 'D']
series = pd.Series(data, index=index)

print(series)
```
Output:
```
A 10
B 20
C 30
D 40
dtype: int64
```

### DataFrame
A DataFrame is a two-dimensional, tabular data structure with labeled
rows and columns, akin to a spreadsheet. It is essentially a collection
of Series sharing the same index.

Example:
```python
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000]
}

df = pd.DataFrame(data)
print(df)
```
Output:
```
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000
```

---

## 2. Reading Data

Pandas makes it easy to read and write data in various formats like
CSV, Excel, JSON, SQL, and more.

### Reading CSV Files


```python
df = pd.read_csv('data.csv') # Reads data from a CSV file
```

### Reading Excel Files


```python
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
```

### Reading JSON Files


```python
df = pd.read_json('data.json')
```

---

## 3. Data Cleaning

Data cleaning involves preparing raw data by handling


inconsistencies or errors.

### Dropping Rows/Columns


```python
df = df.drop(columns=['UnnecessaryColumn'])
df = df.dropna() # Drops rows with missing values
```

### Renaming Columns


```python
df = df.rename(columns={'OldName': 'NewName'})
```

### Replacing Values


```python
df['ColumnName'] = df['ColumnName'].replace({'OldValue':
'NewValue'})
```

### Changing Data Types


```python
df['Age'] = df['Age'].astype(int) # Converts to integer type
```

---

## 4. Data Manipulation

### Selecting Data


- By column name:
```python
df['ColumnName']
```
- By multiple columns:
```python
df[['Column1', 'Column2']]
```
- By condition:
```python
df[df['Age'] > 30]
```

### Adding New Columns


```python
df['NewColumn'] = df['Column1'] + df['Column2']
```
### Sorting Data
```python
df = df.sort_values(by='Age', ascending=True)
```

---

## 5. Handling Missing Data

Pandas provides tools to detect and handle missing data effectively.

### Detecting Missing Data


```python
df.isnull() # Returns a DataFrame of True/False for missing values
df.isnull().sum() # Counts missing values for each column
```

### Filling Missing Data


- Fill with a specific value:
```python
df['ColumnName'] = df['ColumnName'].fillna(0)
```
- Fill with column mean/median/mode:
```python
df['ColumnName'] =
df['ColumnName'].fillna(df['ColumnName'].mean())
```
### Dropping Missing Data
```python
df = df.dropna() # Drops rows with missing values
```

---

## 6. Grouping Data

Grouping allows you to aggregate data based on one or more keys.

### Group By
```python
grouped = df.groupby('Category')
```

### Aggregate Functions


```python
grouped['ColumnName'].mean() # Computes the mean for each
group
grouped['ColumnName'].sum() # Computes the sum for each group
```

### Multiple Aggregations


```python
df.groupby('Category').agg({'Column1': 'mean', 'Column2': 'sum'})
```

---

## 7. Merging Data

Pandas provides several methods to merge or join datasets.

### Merging DataFrames


```python
merged_df = pd.merge(df1, df2, on='common_column')
```

### Join Types


- Inner Join (default):
Matches rows with keys in both DataFrames.
- Outer Join:
Includes all rows, filling missing values with NaN.
```python
pd.merge(df1, df2, on='common_column', how='outer')
```
- Left Join:
Includes all rows from the left DataFrame.
```python
pd.merge(df1, df2, on='common_column', how='left')
```
- Right Join:
Includes all rows from the right DataFrame.
```python
pd.merge(df1, df2, on='common_column', how='right')
```

### Concatenating DataFrames


Combine rows or columns of DataFrames:
```python
pd.concat([df1, df2], axis=0) # Stacks rows
pd.concat([df1, df2], axis=1) # Combines columns
```

---

### Summary
Pandas is a versatile tool that allows efficient handling of structured
data. Whether you're cleaning messy data, performing calculations,
or preparing data for visualization, Pandas is your go-to library in
Python. Each operation-reading, cleaning, manipulating, grouping,
and merging-forms the foundation of data analysis workflows.

You might also like