|
| 1 | +# Introduction_to_Pandas_Library_and_DataFrames |
| 2 | + |
| 3 | +**As you have learnt Python Programming , now it's time for some applications.** |
| 4 | + |
| 5 | +- Machine Learning and Data Science is the emerging field of today's time , to work in this this field your first step should be `Data Science` as Machine Learning is all about data. |
| 6 | +- To begin with Data Science your first tool will be `Pandas Library`. |
| 7 | + |
| 8 | +## Introduction of Pandas Library |
| 9 | + |
| 10 | +Pandas is a data analysis and manipulation tool, built on top of the python programming language. Pandas got its name from the term Panel data (‘Pa’ from Panel and ‘da’ from data). Panel data is a data which have rows and columns in it like excel spreadsheets, csv files etc. |
| 11 | + |
| 12 | +**To use Pandas, first we’ve to import it.** |
| 13 | + |
| 14 | +## Why pandas? |
| 15 | + |
| 16 | +* Pandas provides a simple-to-use but very capable set of functions that you can use on your data. |
| 17 | +* It is also associate with other machine learning libraries , so it is important to learn it. |
| 18 | + |
| 19 | +* For example - It is highly used to transform tha data which will be use by machine learning model during the training. |
| 20 | + |
| 21 | + |
| 22 | +```python |
| 23 | +# Importing the pandas |
| 24 | +import pandas as pd |
| 25 | +``` |
| 26 | + |
| 27 | +*To import any module in Python use “import 'module name' ” command, I used “pd” as pandas abbreviation because we don’t need to type pandas every time only type “pd” to use pandas.* |
| 28 | + |
| 29 | + |
| 30 | +```python |
| 31 | +# To check available pandas version |
| 32 | +print(f"Pandas Version is : {pd.__version__}") |
| 33 | +``` |
| 34 | + |
| 35 | + Pandas Version is : 2.1.4 |
| 36 | + |
| 37 | + |
| 38 | +## Understanding Pandas data types |
| 39 | + |
| 40 | +Pandas has two main data types : `Series` and `DataFrames` |
| 41 | + |
| 42 | +* `pandas.Series` is a 1-dimensional column of data. |
| 43 | +* `pandas.DataFrames` is 2 -dimensional data table having rows and columns. |
| 44 | + |
| 45 | +### 1. Series datatype |
| 46 | + |
| 47 | +**To creeate a series you can use `pd.Series()` and passing a python list inside()**. |
| 48 | + |
| 49 | +Note: S in Series is capital if you use small s it will give you an error. |
| 50 | + |
| 51 | +> Let's create a series |
| 52 | +
|
| 53 | + |
| 54 | + |
| 55 | +```python |
| 56 | +# Creating a series of car companies |
| 57 | +cars = pd.Series(["Honda","Audi","Thar","BMW"]) |
| 58 | +cars |
| 59 | +``` |
| 60 | + |
| 61 | + |
| 62 | + |
| 63 | + |
| 64 | + 0 Honda |
| 65 | + 1 Audi |
| 66 | + 2 Thar |
| 67 | + 3 BMW |
| 68 | + dtype: object |
| 69 | + |
| 70 | + |
| 71 | + |
| 72 | +The above code creates a Series of cars companies the name of series is “cars” the code “pd.Series([“Honda” , “Audi” , “Thar”, "BMW"])” means Hey! pandas (pd) create a Series of cars named "Honda" , "Audi" , "Thar" and "BMW". |
| 73 | + |
| 74 | +The default index of a series is 0,1,2….(Remember it starts from 0) |
| 75 | + |
| 76 | +To change the index of any series set the “index” parameter accordingly. It takes the list of index values: |
| 77 | + |
| 78 | + |
| 79 | +```python |
| 80 | +cars = pd.Series(["Honda","Audi","Thar","BMW"],index = ["A" , "B" , "C" ,"D"]) |
| 81 | +cars |
| 82 | +``` |
| 83 | + |
| 84 | + |
| 85 | + |
| 86 | + |
| 87 | + A Honda |
| 88 | + B Audi |
| 89 | + C Thar |
| 90 | + D BMW |
| 91 | + dtype: object |
| 92 | + |
| 93 | + |
| 94 | + |
| 95 | +You can see that the index has been changed from numbers to A, B ,C and D. |
| 96 | + |
| 97 | +And the mentioned ‘dtype’ tells us about the type of data we have in the series. |
| 98 | + |
| 99 | +### 2. DataFrames datatype |
| 100 | + |
| 101 | +DataFrame contains rows and columns like a csv file have. |
| 102 | + |
| 103 | +You can also create a DataFrame by using `pd.DataFrame()` and passing it a Python dictionary. |
| 104 | + |
| 105 | + |
| 106 | +```python |
| 107 | +# Let's create |
| 108 | +cars_with_colours = pd.DataFrame({"Cars" : ["BMW","Audi","Thar","Honda"], |
| 109 | + "Colour" : ["Black","White","Red","Green"]}) |
| 110 | +print(cars_with_colours) |
| 111 | +``` |
| 112 | + |
| 113 | + Cars Colour |
| 114 | + 0 BMW Black |
| 115 | + 1 Audi White |
| 116 | + 2 Thar Red |
| 117 | + 3 Honda Green |
| 118 | + |
| 119 | + |
| 120 | +The dictionary key is the `column name` and value are the `column data`. |
| 121 | + |
| 122 | +*You can also create a DataFrame with the help of series.* |
| 123 | + |
| 124 | + |
| 125 | +```python |
| 126 | +# Let's create two series |
| 127 | +students = pd.Series(["Ram","Mohan","Krishna","Shivam"]) |
| 128 | +age = pd.Series([19,20,21,24]) |
| 129 | + |
| 130 | +students |
| 131 | +``` |
| 132 | + |
| 133 | + |
| 134 | + |
| 135 | + |
| 136 | + 0 Ram |
| 137 | + 1 Mohan |
| 138 | + 2 Krishna |
| 139 | + 3 Shivam |
| 140 | + dtype: object |
| 141 | + |
| 142 | + |
| 143 | + |
| 144 | + |
| 145 | +```python |
| 146 | +age |
| 147 | +``` |
| 148 | + |
| 149 | + |
| 150 | + |
| 151 | + |
| 152 | + 0 19 |
| 153 | + 1 20 |
| 154 | + 2 21 |
| 155 | + 3 24 |
| 156 | + dtype: int64 |
| 157 | + |
| 158 | + |
| 159 | + |
| 160 | + |
| 161 | +```python |
| 162 | +# Now let's create a dataframe with the help of above series |
| 163 | +# pass the series name to the dictionary value |
| 164 | + |
| 165 | +record = pd.DataFrame({"Student_Name":students , |
| 166 | + "Age" :age}) |
| 167 | +print(record) |
| 168 | +``` |
| 169 | + |
| 170 | + Student_Name Age |
| 171 | + 0 Ram 19 |
| 172 | + 1 Mohan 20 |
| 173 | + 2 Krishna 21 |
| 174 | + 3 Shivam 24 |
| 175 | + |
| 176 | + |
| 177 | + |
| 178 | +```python |
| 179 | +# To print the list of columns names |
| 180 | +record.columns |
| 181 | +``` |
| 182 | + |
| 183 | + |
| 184 | + |
| 185 | + |
| 186 | + Index(['Student_Name', 'Age'], dtype='object') |
| 187 | + |
| 188 | + |
| 189 | + |
| 190 | +### Describe Data |
| 191 | + |
| 192 | +**The good news is that pandas has many built-in functions which allow you to quickly get information about a DataFrame.** |
| 193 | +Let's explore the `record` dataframe |
| 194 | + |
| 195 | +#### 1. Use `.dtypes` to find what datatype a column contains |
| 196 | + |
| 197 | + |
| 198 | +```python |
| 199 | +record.dtypes |
| 200 | +``` |
| 201 | + |
| 202 | + |
| 203 | + |
| 204 | + |
| 205 | + Student_Name object |
| 206 | + Age int64 |
| 207 | + dtype: object |
| 208 | + |
| 209 | + |
| 210 | + |
| 211 | +#### 2. use `.describe()` for statistical overview. |
| 212 | + |
| 213 | + |
| 214 | +```python |
| 215 | +print(record.describe()) # It only display the results for numeric data |
| 216 | +``` |
| 217 | + |
| 218 | + Age |
| 219 | + count 4.000000 |
| 220 | + mean 21.000000 |
| 221 | + std 2.160247 |
| 222 | + min 19.000000 |
| 223 | + 25% 19.750000 |
| 224 | + 50% 20.500000 |
| 225 | + 75% 21.750000 |
| 226 | + max 24.000000 |
| 227 | + |
| 228 | + |
| 229 | +#### 3. Use `.info()` to find information about the dataframe |
| 230 | + |
| 231 | + |
| 232 | +```python |
| 233 | +record.info() |
| 234 | +``` |
| 235 | + |
| 236 | + <class 'pandas.core.frame.DataFrame'> |
| 237 | + RangeIndex: 4 entries, 0 to 3 |
| 238 | + Data columns (total 2 columns): |
| 239 | + # Column Non-Null Count Dtype |
| 240 | + --- ------ -------------- ----- |
| 241 | + 0 Student_Name 4 non-null object |
| 242 | + 1 Age 4 non-null int64 |
| 243 | + dtypes: int64(1), object(1) |
| 244 | + memory usage: 196.0+ bytes |
0 commit comments