Skip to content

Commit a45c6e4

Browse files
authored
Merge pull request animator#549 from kRiShNa-429407/main
Add content: Pandas introduction and DataFrames
2 parents ed4b85e + 6ec6fd6 commit a45c6e4

File tree

2 files changed

+245
-0
lines changed

2 files changed

+245
-0
lines changed

contrib/pandas/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# List of sections
22

3+
- [Pandas Introduction and Dataframes in Pandas](introduction.md)
34
- [Pandas Series Vs NumPy ndarray](pandas_series_vs_numpy_ndarray.md)
45
- [Pandas Descriptive Statistics](Descriptive_Statistics.md)
56
- [Group By Functions with Pandas](GroupBy_Functions_Pandas.md)

contrib/pandas/introduction.md

+244
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
# Introduction_to_Pandas_Library_and_DataFrames
2+
3+
**As you have learnt Python Programming , now it's time for some applications.**
4+
5+
- Machine Learning and Data Science is the emerging field of today's time , to work in this this field your first step should be `Data Science` as Machine Learning is all about data.
6+
- To begin with Data Science your first tool will be `Pandas Library`.
7+
8+
## Introduction of Pandas Library
9+
10+
Pandas is a data analysis and manipulation tool, built on top of the python programming language. Pandas got its name from the term Panel data (‘Pa’ from Panel and ‘da’ from data). Panel data is a data which have rows and columns in it like excel spreadsheets, csv files etc.
11+
12+
**To use Pandas, first we’ve to import it.**
13+
14+
## Why pandas?
15+
16+
* Pandas provides a simple-to-use but very capable set of functions that you can use on your data.
17+
* It is also associate with other machine learning libraries , so it is important to learn it.
18+
19+
* For example - It is highly used to transform tha data which will be use by machine learning model during the training.
20+
21+
22+
```python
23+
# Importing the pandas
24+
import pandas as pd
25+
```
26+
27+
*To import any module in Python use “import 'module name' ” command, I used “pd” as pandas abbreviation because we don’t need to type pandas every time only type “pd” to use pandas.*
28+
29+
30+
```python
31+
# To check available pandas version
32+
print(f"Pandas Version is : {pd.__version__}")
33+
```
34+
35+
Pandas Version is : 2.1.4
36+
37+
38+
## Understanding Pandas data types
39+
40+
Pandas has two main data types : `Series` and `DataFrames`
41+
42+
* `pandas.Series` is a 1-dimensional column of data.
43+
* `pandas.DataFrames` is 2 -dimensional data table having rows and columns.
44+
45+
### 1. Series datatype
46+
47+
**To creeate a series you can use `pd.Series()` and passing a python list inside()**.
48+
49+
Note: S in Series is capital if you use small s it will give you an error.
50+
51+
> Let's create a series
52+
53+
54+
55+
```python
56+
# Creating a series of car companies
57+
cars = pd.Series(["Honda","Audi","Thar","BMW"])
58+
cars
59+
```
60+
61+
62+
63+
64+
0 Honda
65+
1 Audi
66+
2 Thar
67+
3 BMW
68+
dtype: object
69+
70+
71+
72+
The above code creates a Series of cars companies the name of series is “cars” the code “pd.Series([“Honda” , “Audi” , “Thar”, "BMW"])” means Hey! pandas (pd) create a Series of cars named "Honda" , "Audi" , "Thar" and "BMW".
73+
74+
The default index of a series is 0,1,2….(Remember it starts from 0)
75+
76+
To change the index of any series set the “index” parameter accordingly. It takes the list of index values:
77+
78+
79+
```python
80+
cars = pd.Series(["Honda","Audi","Thar","BMW"],index = ["A" , "B" , "C" ,"D"])
81+
cars
82+
```
83+
84+
85+
86+
87+
A Honda
88+
B Audi
89+
C Thar
90+
D BMW
91+
dtype: object
92+
93+
94+
95+
You can see that the index has been changed from numbers to A, B ,C and D.
96+
97+
And the mentioned ‘dtype’ tells us about the type of data we have in the series.
98+
99+
### 2. DataFrames datatype
100+
101+
DataFrame contains rows and columns like a csv file have.
102+
103+
You can also create a DataFrame by using `pd.DataFrame()` and passing it a Python dictionary.
104+
105+
106+
```python
107+
# Let's create
108+
cars_with_colours = pd.DataFrame({"Cars" : ["BMW","Audi","Thar","Honda"],
109+
"Colour" : ["Black","White","Red","Green"]})
110+
print(cars_with_colours)
111+
```
112+
113+
Cars Colour
114+
0 BMW Black
115+
1 Audi White
116+
2 Thar Red
117+
3 Honda Green
118+
119+
120+
The dictionary key is the `column name` and value are the `column data`.
121+
122+
*You can also create a DataFrame with the help of series.*
123+
124+
125+
```python
126+
# Let's create two series
127+
students = pd.Series(["Ram","Mohan","Krishna","Shivam"])
128+
age = pd.Series([19,20,21,24])
129+
130+
students
131+
```
132+
133+
134+
135+
136+
0 Ram
137+
1 Mohan
138+
2 Krishna
139+
3 Shivam
140+
dtype: object
141+
142+
143+
144+
145+
```python
146+
age
147+
```
148+
149+
150+
151+
152+
0 19
153+
1 20
154+
2 21
155+
3 24
156+
dtype: int64
157+
158+
159+
160+
161+
```python
162+
# Now let's create a dataframe with the help of above series
163+
# pass the series name to the dictionary value
164+
165+
record = pd.DataFrame({"Student_Name":students ,
166+
"Age" :age})
167+
print(record)
168+
```
169+
170+
Student_Name Age
171+
0 Ram 19
172+
1 Mohan 20
173+
2 Krishna 21
174+
3 Shivam 24
175+
176+
177+
178+
```python
179+
# To print the list of columns names
180+
record.columns
181+
```
182+
183+
184+
185+
186+
Index(['Student_Name', 'Age'], dtype='object')
187+
188+
189+
190+
### Describe Data
191+
192+
**The good news is that pandas has many built-in functions which allow you to quickly get information about a DataFrame.**
193+
Let's explore the `record` dataframe
194+
195+
#### 1. Use `.dtypes` to find what datatype a column contains
196+
197+
198+
```python
199+
record.dtypes
200+
```
201+
202+
203+
204+
205+
Student_Name object
206+
Age int64
207+
dtype: object
208+
209+
210+
211+
#### 2. use `.describe()` for statistical overview.
212+
213+
214+
```python
215+
print(record.describe()) # It only display the results for numeric data
216+
```
217+
218+
Age
219+
count 4.000000
220+
mean 21.000000
221+
std 2.160247
222+
min 19.000000
223+
25% 19.750000
224+
50% 20.500000
225+
75% 21.750000
226+
max 24.000000
227+
228+
229+
#### 3. Use `.info()` to find information about the dataframe
230+
231+
232+
```python
233+
record.info()
234+
```
235+
236+
<class 'pandas.core.frame.DataFrame'>
237+
RangeIndex: 4 entries, 0 to 3
238+
Data columns (total 2 columns):
239+
# Column Non-Null Count Dtype
240+
--- ------ -------------- -----
241+
0 Student_Name 4 non-null object
242+
1 Age 4 non-null int64
243+
dtypes: int64(1), object(1)
244+
memory usage: 196.0+ bytes

0 commit comments

Comments
 (0)