Preprocessing + EDA - Jupyter Notebook
Preprocessing + EDA - Jupyter Notebook
Importing data
using with
To readline
print(file.readline())
NOTE: using ! before a command gives access to the shell, hence we can execute cmds on the python cmd
line
For mixed data types -> don't use loadtext(), instead use Dataframe
Consider a dataset where , sep -> it is tab delimiter, comment takes characters that comments occur after in the
file, which in this case is '#'. na_values takes a list of strings to recognize as NA/NaN, in this case the string
'Nothing' and replace it with NaN.
xls is not a flat file because it is a spreadsheet consisting of many sheets, not a
single table.
simpler way
beautiful soup
Cleaning
misleading
correct
Treat duplicates