0% found this document useful (0 votes)

17 views20 pages

Data Type in Python

This document outlines the process of importing data in Python, focusing on flat files such as .txt and .csv. It explains the importance of understanding file types, headers, and delimiters, and introduces libraries like NumPy and pandas for data importation. The document emphasizes best practices for handling file connections and using data structures effectively in data science.

Uploaded by

saadia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views20 pages

Data Type in Python

Uploaded by

saadia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Data Type in Python

Second Course:
Importing data in python

In this course we will learn to import data from large variety of sources

for example,
(i) flat files such as .txts and .csvs;
(ii) files native to other software such as Excel spreadsheets, Stata, SAS and
MATLAB files;

First off, we're going to learn how to import basic text files
which we can broadly classify into 2 types of files –
1. those containing plain text,
such as the opening of Mark Twain's novel The
Adventures of Huckleberry Finn, which you can see
here,

2. Table data
column is a characteristic or feature, such
as gender, cabin and 'survived or not'. The
latter is known as a flat file
open a connection to the file. To
do so,

you assign the filename to a

variable as a string, pass the
filename to the function

open and also pass it the

argument mode equals 'r',

line3: assign text from a file to a variable text by applying a method read

now print and check the text

It is good to know how to write

data on file but we will not use
it in course

You can avoid having to

close the connection to the file by

What you're doing here is called 'binding' a variable in the context manager construct;
while still within this construct, the variable file will be bound to open(filename, 'r'). It is
best practice to use the with statement as you never have to concern yourself with
closing the files again.
The importance of flat files in data
science
Flat Files:
Flat files are basic text files containing

row or record is a unique passenger onboard

and each column is a feature or attribute, such
as

name, gender and cabin.

It is also essential to note that a flat file can
have a header, such as in 'titanic dot csv',

It will be important to know whether or not your

file has a header as it may alter your data
import.

File extension:

The values in each row are separated by

commas. Another common extension for a
flat file is dot txt, which means a text file.
Values in flat files can be separated by
characters or sequences of characters
other than commas, such as a tab, and
the character or characters in question is
called a delimiter.

See here an example of a tab-

delimited file. The data consists of the
famous MNIST digit recognition
images, where

each row contains the pixel values of a

given image. Note that all fields in the
MNIST data are numeric, while the
'titanic dot csv' also contained strings.
If they consist entirely of numbers and
we want to store them as a numpy array,
we could use numpy.

If, instead, we want to store the data in a

dataframe, we could use pandas.

In the rest of this Chapter, you'll learn

how to import flat files that contain only
numerical data, such as the MNIST
data, and import flat files that contain
both numerical data and strings, such as
'titanic dot csv'.

Importing flat files using NumPy

if you want to import a flat file and assign it to a variable? If all the data are numerical,
you can use the package numpy to import the data as a numpy array.

Why NumPy?

numpy arrays are often essential for other packages, such as

- scikit-learn, a popular Machine Learning package for Python.
Numpy itself has a number of built-in functions that make it far easier and more efficient
for us to import data as arrays.
Enter the NumPy functions
- loadtxt and
- genfromtxt

To use either of these we

first need to import
NumPy.

We then call loadtxt and

pass it the filename as the
first argument, along with
the delimiter as the 2nd
argument.

Note that the default

delimiter is any white
space so we’ll usually
need to specify it explicitly.
If You want to set usecols equals the list containing ints 0 and 2.

You can also import different datatypes into NumPy arrays: for example, setting the
argument dtype equals 'str' will ensure that all entries are imported as strings.

This can we see when we have mix data

Strings and floats in table as below
Importing flat files using pandas
prompted Wes McKinney to develop
the pandas library for Python.

Nothing speaks to the project of

pandas more than the
documentation itself:

As Hadley Wickham tweeted,

"A matrix has rows and
columns. A data frame has
observations and variables."
For all of these below reasons, it is now
standard and best practice in Data
Science to use pandas to import flat
files as DataFrames.
To use pandas, you first need to import it.
Then, if we wish to import a CSV in the most basic case all we need to do is to call the
function read_csv()

and supply it with a single argument, the name of the file. Having assigned the
DataFrame to the variable data, we can check the first 5 rows of the DataFrame,
including the header, with the command 'data.head'.

RDA-2020 Full Ver
92% (13)
RDA-2020 Full Ver
343 pages
List of Books PDF
75% (4)
List of Books PDF
29 pages
Python Data Import
100% (1)
Python Data Import
28 pages
Importing Data in Python
No ratings yet
Importing Data in Python
13 pages
Importing Data Python Cheat Sheet PDF
No ratings yet
Importing Data Python Cheat Sheet PDF
1 page
Importing Data Cheat Sheet Python For Data Science: Pickled Files Exploring Your Data
No ratings yet
Importing Data Cheat Sheet Python For Data Science: Pickled Files Exploring Your Data
1 page
Chapter1 2
No ratings yet
Chapter1 2
52 pages
Welcome To The Course!: Hugo Bowne-Anderson
No ratings yet
Welcome To The Course!: Hugo Bowne-Anderson
52 pages
Data Management With Python, SQLite, and SQLAlchemy
No ratings yet
Data Management With Python, SQLite, and SQLAlchemy
57 pages
File Handling
No ratings yet
File Handling
12 pages
HKU - 7001 - 3.1 Managing Data I
No ratings yet
HKU - 7001 - 3.1 Managing Data I
73 pages
File Handling
No ratings yet
File Handling
36 pages
Day_10 Python External Files
No ratings yet
Day_10 Python External Files
10 pages
File Handling in Python
No ratings yet
File Handling in Python
25 pages
FILES
No ratings yet
FILES
59 pages
Unit - V
No ratings yet
Unit - V
29 pages
Mod 4
No ratings yet
Mod 4
63 pages
2.1 Importing Python Data
No ratings yet
2.1 Importing Python Data
1 page
File Handling
No ratings yet
File Handling
12 pages
Pandas
No ratings yet
Pandas
57 pages
Chapter 5 - File Handling
No ratings yet
Chapter 5 - File Handling
68 pages
Mbict 305 - 162 - 2122 - 11 - 10042022 - 123
No ratings yet
Mbict 305 - 162 - 2122 - 11 - 10042022 - 123
31 pages
Class 12 File - Handling 1
No ratings yet
Class 12 File - Handling 1
4 pages
Pandas 1
No ratings yet
Pandas 1
64 pages
Python 07 File
No ratings yet
Python 07 File
22 pages
Importing Flat File Data
No ratings yet
Importing Flat File Data
1 page
Class Xii File Handling
No ratings yet
Class Xii File Handling
14 pages
2024 25 COL100 Lab 13 File Handling
No ratings yet
2024 25 COL100 Lab 13 File Handling
6 pages
Xii Mll 083 Xi Fila Handling Qp
No ratings yet
Xii Mll 083 Xi Fila Handling Qp
8 pages
Chap.5.File Handling
No ratings yet
Chap.5.File Handling
7 pages
08 Slide Extended
No ratings yet
08 Slide Extended
44 pages
Unit 3
No ratings yet
Unit 3
70 pages
File Handling Main
No ratings yet
File Handling Main
26 pages
Python GTU Study Material E-Notes 3 16012021061619AM
No ratings yet
Python GTU Study Material E-Notes 3 16012021061619AM
36 pages
Chapter+6+Sections+1 3
No ratings yet
Chapter+6+Sections+1 3
10 pages
Python File Handling
No ratings yet
Python File Handling
18 pages
4 Importing Data R1
No ratings yet
4 Importing Data R1
18 pages
Ch - 5 File Handling
No ratings yet
Ch - 5 File Handling
11 pages
PSPP RSK
No ratings yet
PSPP RSK
25 pages
Python Module - 4herrewHRW
No ratings yet
Python Module - 4herrewHRW
79 pages
Binary Files
No ratings yet
Binary Files
30 pages
Ch5 File Handling
No ratings yet
Ch5 File Handling
5 pages
Unit 2 Python
No ratings yet
Unit 2 Python
55 pages
Python Unit 5
No ratings yet
Python Unit 5
21 pages
Meeting 11 Basic Python 3
No ratings yet
Meeting 11 Basic Python 3
82 pages
H Python Cheat Sheet
No ratings yet
H Python Cheat Sheet
2 pages
XIIComp SC 26
No ratings yet
XIIComp SC 26
22 pages
File Handling
No ratings yet
File Handling
23 pages
File Handling in Python - Notes
No ratings yet
File Handling in Python - Notes
11 pages
UNIT - III - Python File Handling, Reading and Writing Files
No ratings yet
UNIT - III - Python File Handling, Reading and Writing Files
23 pages
Introduction To Files
No ratings yet
Introduction To Files
17 pages
Python For Data Science Unit 3: DR Kruti Dangarwala CSE & IT Department Svmit
No ratings yet
Python For Data Science Unit 3: DR Kruti Dangarwala CSE & IT Department Svmit
113 pages
Lesson 23 Notes - Pandas Reading Data
No ratings yet
Lesson 23 Notes - Pandas Reading Data
17 pages
File Handling
No ratings yet
File Handling
21 pages
File Handling Revision Notes Question Bank
No ratings yet
File Handling Revision Notes Question Bank
15 pages
CSV File
No ratings yet
CSV File
30 pages
File-Handling in Python
No ratings yet
File-Handling in Python
5 pages
Numpy
No ratings yet
Numpy
30 pages
Python Data File Handling XII CS 2022-23 As On 28-10-2022
No ratings yet
Python Data File Handling XII CS 2022-23 As On 28-10-2022
62 pages
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Mastering Python Programming: A Comprehensive Guide: The IT Collection
From Everand
Mastering Python Programming: A Comprehensive Guide: The IT Collection
Christopher Ford
5/5 (1)
Perioperative Nursing Skills and Procedure PDF
No ratings yet
Perioperative Nursing Skills and Procedure PDF
4 pages
Ed To KJM
No ratings yet
Ed To KJM
3 pages
2023 Met Integration MDB 203
No ratings yet
2023 Met Integration MDB 203
32 pages
Studentsco: Computer Science
No ratings yet
Studentsco: Computer Science
11 pages
46 - The Thermal Efficiency Improvement of A Steam Rankine Cycle by Innovative Design of A Hybrid Cooling Tower and A Solar Chimney Concept
No ratings yet
46 - The Thermal Efficiency Improvement of A Steam Rankine Cycle by Innovative Design of A Hybrid Cooling Tower and A Solar Chimney Concept
1 page
System Setup and Log Transfer Configuration
No ratings yet
System Setup and Log Transfer Configuration
12 pages
Affordable Rental Housing List 2023
No ratings yet
Affordable Rental Housing List 2023
3 pages
Tabla de Capacidades Grúa Telescópica Zoomliom QY30V532.5
No ratings yet
Tabla de Capacidades Grúa Telescópica Zoomliom QY30V532.5
2 pages
Care, Compassion, Excellence Mount Lockyer Primary School A Place To Learn and Grow
No ratings yet
Care, Compassion, Excellence Mount Lockyer Primary School A Place To Learn and Grow
4 pages
Newtons Laws Lab Activities
No ratings yet
Newtons Laws Lab Activities
5 pages
Poem Sad I Ams
No ratings yet
Poem Sad I Ams
5 pages
Samsung Magician Manual.v.4.1 (En)
No ratings yet
Samsung Magician Manual.v.4.1 (En)
30 pages
Rapid Drawdown: Figure 1 The Problem Configuration
No ratings yet
Rapid Drawdown: Figure 1 The Problem Configuration
5 pages
Resource 20240217134251 Grade 9 Final Term Exam Capsule Mathematics
No ratings yet
Resource 20240217134251 Grade 9 Final Term Exam Capsule Mathematics
8 pages
Efflux
No ratings yet
Efflux
29 pages
Simple Stress
No ratings yet
Simple Stress
4 pages
Grade 7 Unit 1 Lesson 3 Handouts V2
No ratings yet
Grade 7 Unit 1 Lesson 3 Handouts V2
39 pages
Hydraulic Design Manual - Design Flood and Check Flood Standards
No ratings yet
Hydraulic Design Manual - Design Flood and Check Flood Standards
2 pages
Verified PDF Download Decision Science in Action Theory and Applications of Modern Decision Analytic Optimisation Asset Analytics FULL Version
100% (1)
Verified PDF Download Decision Science in Action Theory and Applications of Modern Decision Analytic Optimisation Asset Analytics FULL Version
402 pages
Dao 2007 34
No ratings yet
Dao 2007 34
12 pages
Ovarian and Menstrual Cycles
No ratings yet
Ovarian and Menstrual Cycles
7 pages
Cat - 3126B
No ratings yet
Cat - 3126B
5 pages
Insat 4B at 93. °E: MY Online Satellite Tracking Google Picasa Gallery: Satellites Receiving
No ratings yet
Insat 4B at 93. °E: MY Online Satellite Tracking Google Picasa Gallery: Satellites Receiving
5 pages
N.K En-Fab Technology, GUJARAT: M/S Solvent Stripper
No ratings yet
N.K En-Fab Technology, GUJARAT: M/S Solvent Stripper
1 page
M.phil Research Questionnaire
No ratings yet
M.phil Research Questionnaire
5 pages
DataWedge 3.0 AdvancedConfigurationGuide
No ratings yet
DataWedge 3.0 AdvancedConfigurationGuide
94 pages
Quality Objectives
86% (7)
Quality Objectives
8 pages
Topic 7: Traditional Banking Products
No ratings yet
Topic 7: Traditional Banking Products
26 pages

Data Type in Python

Uploaded by

Data Type in Python

Uploaded by

Data Type in Python

you assign the filename to a

open and also pass it the

now print and check the text

It is good to know how to write

You can avoid having to

row or record is a unique passenger onboard

name, gender and cabin.

It will be important to know whether or not your

The values in each row are separated by

See here an example of a tab-

each row contains the pixel values of a

If, instead, we want to store the data in a

In the rest of this Chapter, you'll learn

Importing flat files using NumPy

numpy arrays are often essential for other packages, such as

To use either of these we

We then call loadtxt and

Note that the default

This can we see when we have mix data

Nothing speaks to the project of

As Hadley Wickham tweeted,

You might also like