Skip to content

Commit 35baa35

Browse files
Add files via upload
1 parent 741bb87 commit 35baa35

File tree

1 file changed

+395
-0
lines changed

1 file changed

+395
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,395 @@
1+
# Introduction_to_Pandas_Library_and_DataFrames
2+
3+
4+
> Content Creator - Krishna Kaushik
5+
6+
**As you have learnt Python Programming , now it's time for some applications.**
7+
8+
- Machine Learning and Data Science is the emerging field of today's time , to work in this this field your first step should be `Data Science` as Machine Learning is all about data.
9+
- To begin with Data Science your first tool will be `Pandas Library`.
10+
11+
## Introduction of Pandas Library
12+
13+
Pandas is a data analysis and manipulation tool, built on top of the python programming language. Pandas got its name from the term Panel data (‘Pa’ from Panel and ‘da’ from data). Panel data is a data which have rows and columns in it like excel spreadsheets, csv files etc.
14+
15+
**To use Pandas, first we’ve to import it.**
16+
17+
## Why pandas?
18+
19+
* Pandas provides a simple-to-use but very capable set of functions that you can use on your data.
20+
* It is also associate with other machine learning libraries , so it is important to learn it.
21+
22+
* For example - It is highly used to transform tha data which will be use by machine learning model during the training.
23+
24+
25+
```python
26+
# Importing the pandas
27+
import pandas as pd
28+
```
29+
30+
*To import any module in Python use “import 'module name' ” command, I used “pd” as pandas abbreviation because we don’t need to type pandas every time only type “pd” to use pandas.*
31+
32+
33+
```python
34+
# To check available pandas version
35+
print(f"Pandas Version is : {pd.__version__}")
36+
```
37+
38+
Pandas Version is : 2.1.4
39+
40+
41+
## Understanding Pandas data types
42+
43+
Pandas has two main data types : `Series` and `DataFrames`
44+
45+
* `pandas.Series` is a 1-dimensional column of data.
46+
* `pandas.DataFrames` is 2 -dimensional data table having rows and columns.
47+
48+
### 1. Series datatype
49+
50+
**To creeate a series you can use `pd.Series()` and passing a python list inside()**.
51+
52+
Note: S in Series is capital if you use small s it will give you an error.
53+
54+
> Let's create a series
55+
56+
57+
58+
```python
59+
# Creating a series of car companies
60+
cars = pd.Series(["Honda","Audi","Thar","BMW"])
61+
cars
62+
```
63+
64+
65+
66+
67+
0 Honda
68+
1 Audi
69+
2 Thar
70+
3 BMW
71+
dtype: object
72+
73+
74+
75+
The above code creates a Series of cars companies the name of series is “cars” the code “pd.Series([“Honda” , “Audi” , “Thar”, "BMW"])” means Hey! pandas (pd) create a Series of cars named "Honda" , "Audi" , "Thar" and "BMW".
76+
77+
The default index of a series is 0,1,2….(Remember it starts from 0)
78+
79+
To change the index of any series set the “index” parameter accordingly. It takes the list of index values:
80+
81+
82+
```python
83+
cars = pd.Series(["Honda","Audi","Thar","BMW"],index = ["A" , "B" , "C" ,"D"])
84+
cars
85+
```
86+
87+
88+
89+
90+
A Honda
91+
B Audi
92+
C Thar
93+
D BMW
94+
dtype: object
95+
96+
97+
98+
You can see that the index has been changed from numbers to A, B ,C and D.
99+
100+
And the mentioned ‘dtype’ tells us about the type of data we have in the series.
101+
102+
### 2. DataFrames datatype
103+
104+
DataFrame contains rows and columns like a csv file have.
105+
106+
You can also create a DataFrame by using `pd.DataFrame()` and passing it a Python dictionary.
107+
108+
109+
```python
110+
# Let's create
111+
cars_with_colours = pd.DataFrame({"Cars" : ["BMW","Audi","Thar","Honda"],
112+
"Colour" : ["Black","White","Red","Green"]})
113+
cars_with_colours
114+
```
115+
116+
117+
118+
119+
<div>
120+
<style scoped>
121+
.dataframe tbody tr th:only-of-type {
122+
vertical-align: middle;
123+
}
124+
125+
.dataframe tbody tr th {
126+
vertical-align: top;
127+
}
128+
129+
.dataframe thead th {
130+
text-align: right;
131+
}
132+
</style>
133+
<table border="1" class="dataframe">
134+
<thead>
135+
<tr style="text-align: right;">
136+
<th></th>
137+
<th>Cars</th>
138+
<th>Colour</th>
139+
</tr>
140+
</thead>
141+
<tbody>
142+
<tr>
143+
<th>0</th>
144+
<td>BMW</td>
145+
<td>Black</td>
146+
</tr>
147+
<tr>
148+
<th>1</th>
149+
<td>Audi</td>
150+
<td>White</td>
151+
</tr>
152+
<tr>
153+
<th>2</th>
154+
<td>Thar</td>
155+
<td>Red</td>
156+
</tr>
157+
<tr>
158+
<th>3</th>
159+
<td>Honda</td>
160+
<td>Green</td>
161+
</tr>
162+
</tbody>
163+
</table>
164+
</div>
165+
166+
167+
168+
The dictionary key is the `column name` and value are the `column data`.
169+
170+
*You can also create a DataFrame with the help of series.*
171+
172+
173+
```python
174+
# Let's create two series
175+
students = pd.Series(["Ram","Mohan","Krishna","Shivam"])
176+
age = pd.Series([19,20,21,24])
177+
178+
students
179+
```
180+
181+
182+
183+
184+
0 Ram
185+
1 Mohan
186+
2 Krishna
187+
3 Shivam
188+
dtype: object
189+
190+
191+
192+
193+
```python
194+
age
195+
```
196+
197+
198+
199+
200+
0 19
201+
1 20
202+
2 21
203+
3 24
204+
dtype: int64
205+
206+
207+
208+
209+
```python
210+
# Now let's create a dataframe with the help of above series
211+
# pass the series name to the dictionary value
212+
213+
record = pd.DataFrame({"Student_Name":students ,
214+
"Age" :age})
215+
record
216+
```
217+
218+
219+
220+
221+
<div>
222+
<style scoped>
223+
.dataframe tbody tr th:only-of-type {
224+
vertical-align: middle;
225+
}
226+
227+
.dataframe tbody tr th {
228+
vertical-align: top;
229+
}
230+
231+
.dataframe thead th {
232+
text-align: right;
233+
}
234+
</style>
235+
<table border="1" class="dataframe">
236+
<thead>
237+
<tr style="text-align: right;">
238+
<th></th>
239+
<th>Student_Name</th>
240+
<th>Age</th>
241+
</tr>
242+
</thead>
243+
<tbody>
244+
<tr>
245+
<th>0</th>
246+
<td>Ram</td>
247+
<td>19</td>
248+
</tr>
249+
<tr>
250+
<th>1</th>
251+
<td>Mohan</td>
252+
<td>20</td>
253+
</tr>
254+
<tr>
255+
<th>2</th>
256+
<td>Krishna</td>
257+
<td>21</td>
258+
</tr>
259+
<tr>
260+
<th>3</th>
261+
<td>Shivam</td>
262+
<td>24</td>
263+
</tr>
264+
</tbody>
265+
</table>
266+
</div>
267+
268+
269+
270+
271+
```python
272+
# To print the list of columns names
273+
record.columns
274+
```
275+
276+
277+
278+
279+
Index(['Student_Name', 'Age'], dtype='object')
280+
281+
282+
283+
### Describe Data
284+
285+
**The good news is that pandas has many built-in functions which allow you to quickly get information about a DataFrame.**
286+
Let's explore the `record` dataframe
287+
288+
#### 1. Use `.dtypes` to find what datatype a column contains
289+
290+
291+
```python
292+
record.dtypes
293+
```
294+
295+
296+
297+
298+
Student_Name object
299+
Age int64
300+
dtype: object
301+
302+
303+
304+
#### 2. use `.describe()` for statistical overview.
305+
306+
307+
```python
308+
record.describe() # It only display the results for numeric data
309+
```
310+
311+
312+
313+
314+
<div>
315+
<style scoped>
316+
.dataframe tbody tr th:only-of-type {
317+
vertical-align: middle;
318+
}
319+
320+
.dataframe tbody tr th {
321+
vertical-align: top;
322+
}
323+
324+
.dataframe thead th {
325+
text-align: right;
326+
}
327+
</style>
328+
<table border="1" class="dataframe">
329+
<thead>
330+
<tr style="text-align: right;">
331+
<th></th>
332+
<th>Age</th>
333+
</tr>
334+
</thead>
335+
<tbody>
336+
<tr>
337+
<th>count</th>
338+
<td>4.000000</td>
339+
</tr>
340+
<tr>
341+
<th>mean</th>
342+
<td>21.000000</td>
343+
</tr>
344+
<tr>
345+
<th>std</th>
346+
<td>2.160247</td>
347+
</tr>
348+
<tr>
349+
<th>min</th>
350+
<td>19.000000</td>
351+
</tr>
352+
<tr>
353+
<th>25%</th>
354+
<td>19.750000</td>
355+
</tr>
356+
<tr>
357+
<th>50%</th>
358+
<td>20.500000</td>
359+
</tr>
360+
<tr>
361+
<th>75%</th>
362+
<td>21.750000</td>
363+
</tr>
364+
<tr>
365+
<th>max</th>
366+
<td>24.000000</td>
367+
</tr>
368+
</tbody>
369+
</table>
370+
</div>
371+
372+
373+
374+
#### 3. Use `.info()` to find information about the dataframe
375+
376+
377+
```python
378+
record.info()
379+
```
380+
381+
<class 'pandas.core.frame.DataFrame'>
382+
RangeIndex: 4 entries, 0 to 3
383+
Data columns (total 2 columns):
384+
# Column Non-Null Count Dtype
385+
--- ------ -------------- -----
386+
0 Student_Name 4 non-null object
387+
1 Age 4 non-null int64
388+
dtypes: int64(1), object(1)
389+
memory usage: 196.0+ bytes
390+
391+
392+
393+
```python
394+
395+
```

0 commit comments

Comments
 (0)