Data File Handling
Friday, May 20, 2022 12:48 PM
Programs NEED:
In a general program after execution
neither the input nor the output are
Transient Persistent saved for future use hence in order to
store the output of a program in a file
Transient:
and to perform various operations on
Run for a short period and produce the output but when they end the data
it we require Data File Handling.
disappears as the data is saved in RAM (volatile memory) as temporary memory.
Data File
Persistent: In PYTHON:
Run for a longer period and save some data in permanent storage. If closed they Python allows us to read and save data
execute from the same point. to external files permanently in
E.g.: Operating Systems Handling secondary storage.
The script with .py extension is also
Files saved to the secondary storage
permanently.
Data structures where data is packaged It INVOLVES:
to store in devices I) Opening the file
II) Performing operations
• It is a stream of bytes, comprising data of interest.
III) Closing the file
• The data maintained inside a file is termed persistent.
• They provide a means of communication btw the program and the world.
TEXT FILES BINARY FILES
1. A text file is usually considered as sequence of lines. 1. A binary file contains arbitrary binary data i.e. numbers stored in the
2. It is a simple ASCII/UNICODE sequence. file, can be used for numerical operation(s).
3. Line is a sequence of characters stored on permanent storage media. 2. Hence, working on a bin file, means interpreting raw bit pattern(s)
4. Each line is terminated by a special character, known as End of Line.
read into correct data type in the program.
In python, the EOL character is '\n'. 3. In the case of binary file it is extremely important to interpret correct
5. At the lowest text file will be collection of bytes. data type while reading the file.
6. Are stored in human readable form and can be created using any text While it's possible to interpret a stream of bytes (originally a string) as
editor.
7. Examples: numeric value but they are often incorrect and do not give the desired
output after file processing.
Document Files: .txt, .rtf 4. Python provides special module(s) for encoding and decoding of data
Tabular Files: .csv, .tsv for binary file.
5. Binary files are made-up in non-human readable form and need
Source Code Files: .py, .js, .c, .app, .java
programs to access their constant.
Web Standard Files: .html, .xml, .css, .json 6. Used to store binary data such as images, video files, audio files, etc.
Configuration Files: .ini, .cfg, .reg
CSV FILES
(comma-separated values)
1. CSV is just like a text file it's in human readable format except it's used to store data in tabular form with each line in a CSV file is treated as a record.
2. It is the most preferred import and export format for databases and spreadsheets.
3. The separator character of CSV files is called a delimiter.
comma (,)
tab ('\t')
colon (:)
pipe (|)
semicolon (;)
OPENING A FILE
open()
1. open() function takes the name of the file as the first argument
2. Syntax: <file variable>/<file object or handle> = open(file_name, access_mode)
For a file 'abc',
f=open('abc')
3. In the given syntax we notice the following elements:
i) file_object
It establishes a link btw the program and data file. (Also referred to as file handle or object).
ii) access_mode
It defines the location of the file pointer (from where data is being read and written to).
4. Modes for opening a file:
(r) (w) (a)
Read Mode Write Mode Append Mode
to read the file. to write to the file. to write at the end of the file.
5. When file is not found the, "FileNotFoundError" is generated.
CLOSING A FILE
close()
1. close() function flushes any unwritten info and closes the file object
2. Syntax: <file variable>/<file object or handle>.close()
For a file 'abc' open under filehandle 'f',
f.close()
3. f.closed tells us whether the file object is closed in the form of True and False.
FILE MODES
Reading Mode:
r r+ rb rb+
• Reading only. • Both reading and writing. • Reading only in binary format. • Both reading and writing in binary format.
• Default mode. • File pointer at beginning. • Default mode. • File pointer at beginning.
• File pointer at beginning. • File pointer at beginning.
Writing Mode:
w w+ wb wb+
• Writing only. • Both writing and reading. • Writing only in binary format. • Both writing and reading in binary format.
• File pointer at end. • File pointer at end. • File pointer at end. • File pointer at end.
• Overwrites existing file. • Overwrites existing file. • Overwrites existing file. • Overwrites existing file.
Creates new file for the particular mode if it doesn't exist.
Appending Mode:
a a+ ab ab+
• Appending only. • Both appending and reading. • Appending only in binary format. • Both appending and reading in binary format.
• File pointer at end. • File pointer at end. • File pointer at end. • File pointer at end.
Creates new file for the particular mode if it doesn't exist.
READING A FILE
read() or read(n) readline() readlines()
x=file.read()
x=file.readline() x=file.readlines()
x=file.read(n)
Reads entire file or specified 'n' bytes. Reads only one line at a time. Reads all lines in the txt file.
File pointer goes from beginning to end of file. File pointer goes from cursor (beginning) to EOL.
Reads as a string. Reads as a string. Reads as a list of strings separated by '\n'.
When n is invalid (negative), reads entire file. Terminates after the EOL. The end of file is given by an empty list.
Also reads the EOL and fixes it in the string. As it returns as a string it can be manipulated.
If we do not use the close statement then no data will be written to the file and the data will be flushed.
WRITING TO A FILE
write() writelines()
x=file.write() x=file.writelines()
Takes a line in str and writes it to the file in a single line. Writes all sequence data types, incluiding str to the text file.
To store with EOL, it needs to specified at the end of string. To store with EOL, it needs to specified at the end of argument.
The entire argument must be a string. Accepts all sequence data types.
New file is created when it doesn't exist. New file is created when it doesn't exist.
Existing file gets overwritten or overridden each time (old data is lost). Existing file gets overwritten or overridden each time (old data is lost).
with statement
• with open("filename", "filemode") as fileobj:
f.write("argument1 EOLchar")
f.write("argument2 EOLchar")
• Used to group file operation statements within block to make code more compact and readable.
• Ensures that all resources allocate to the file objects get deallocated automatically once we stop using the file.
APPENDING A FILE
fileobj=open("filename", "a")
Append means 'to add' hence the data written under the 'a' mode is added to the file unlike in 'w' mode which overwrites the file.
We can deduce that:
• If the file exists, it will not be erased and if it doesn't exist then it will be craeted.
• When data is written, it gets added to end of the file which means that the file pointer is at the end of the file.
BINARY FILES
(READING AND WRITING A FILE)
The Pickle Module:
• It is used to read and write structures such as list and dictionaries.
• Used for serialising and desearilizing.
Serialising or pickling is the transformation of data/object in RAM to byte streams for storage in disk or db or sending through a network.
It refers to the process of converting the structure to a byte stream before writing to the file.
whereas
Unpickling refers to converting the byte stream back to the original data structure.
• As we know in python writing and reading work with str parameters, conversion is necessary.
• Hence it the module can be used to store any kind of obj in bin file as it allows python obj with their structures.
• Steps:
Import the pickle module.
Open the bin file in the file object, in required mode.
To read ("rb"):
Use pickle.load(fileobject)
To write ("wb"):
Take the desired input in a variable e.g.- x.
Now use pickle.dump(x,f)
Close the file using fileobject.close()
OPERATIONS IN A BINARY FILE
Inserting/Appending Reading Searching Updating
Step1: Import Pickle module.
Step1: Import Pickle Module.
Step1: Import Pickle Module.
Step1: Import Pickle Module.
Step2: Add record using dump()
Step2: Print using load() method. Step2: load() method takes all the data
Step2: load() method is used and the
method. in a variable e.g.- r.
elm requiring change is
Step3: The to be searched elm is taken
searched.
in another variable- x.
Step3: If found, the new data is written
Step4: For loop in r is initiated with an
in a variable and written using
if loop_var[0]==x dump().
Step5: If the elm is found, the statement
or else
is printed. If not found, error is generated.
File mode: wb File mode: rb File mode: rb File mode: rb+
EXCEPTION HANDLING
try block:
• Signifies to run the code.
• Includes statements that might generate some error or exception.
except block:
• Runs when an error or exception occurs.
The except exception is a base class with all types of exceptions in Python.
Can be used when error is not known.
RANDOM ACCESS IN FILES
seek()
• Changes position of the file pointer(handle, cursor) to a given, specific position.
• 0= Moves file pointer to beggining of the file, the default positioning.
1= Keeps file pointer to current of the file.
2= Moves file pointer to end of the file.
• Seek() can be done using two methods:
i) Absolute Positioning
The file pointer positions itself.
Syntax: f.seek(file_location)
e,g,- f.seek(20) places the file pointer at the 20th byte in the file.
ii) Relative Positioning
Syntax: f.seek(offset, from_what)
e.g.- f.seek(20,2) places the file pointer 20 bytes forward from the current position, which will be 2.
f.seek(-10,5) places the file pointer 10 bytes backward from the current position, which will be 5.
tell()
• Syntax: f.tell()
• Returns current position of the file pointer.
• In reading or writing mode, the file pointer osd at 0 bytes.
• In append mode, the file pointer is at the last byte.
CSV FILES
(COMMA SEPARATED VALUES)
• Each line of the file is called a record.
• Each record consists of fields separated by commas (delimiter).
• Used for storing tabular data in spreadsheet or database.
• The tabular data is stored as text.
• Advantages:
i) faster
ii) smaller in size
iii) easy to genarate and import on a spreadsheet or database
iv) human readable and easy to edit
v) simple to interpret and parse
vi) processed by almost all existing applications.
Reading Writing
Syntax: csv.reader(fileobject) Syntax: csv.writer(fileobject)
reader() is an iterable object which reads CSV file line by line. writer() object converts user's data into a delimited string.
Step1: import csv
Step1: import csv
Step2: fileobject=open("filename", "r")
Step2: In a varaiable fields, take the field names in list form.
Step3: variable=csv.reader(fileobject)
Step3: In a variable rows, take the field data in form of lists.
Step4: Iterate
Step4: fileobject=open("filename", "w")
for row in variable:
Step5: variable=csv.writer(fileobject, delimiter= ',')
print(row) Step6: Write the fields to the csv file.
Step5: Close the file.
variable=csv.writerow(fields)
fileobject.close() Step7: Write the data row-wise. This can be done two ways.
By iterating using writerow or directly- all at once, using writerows.
for i in rows: csv.writerows(rows)
csv.writerow(i)
Step5: Close the file.
fileobject.close(