UNIT-4
FILES,
EXCEPTIONS
MODULES
FILE OBJECTS in PYTHON
The file object is the default and easiest
way to manipulate files in Python. It
includes a couple of methods and attributes
which make it easier for developers to read
from, and write to, files in the filesystem.
File objects can be used to access not only
normal disk files, but also any other type
of "file" that uses that abstraction.
FILE OBJECTS in PYTHON
With a file object, you can read and/or write data to a
file as seen by the underlying operating system.
Python reacts to any I/O error related to a file object
by raising an instance of built-in exception class
IOError
FILE OBJECTS in PYTHON
Errors that cause this exception include open
failing to open or create a file,
calls to a method on a file object to which
that method doesn’t apply
(e.g., calling write on a read-only file
object, or calling seek on a nonseekable
file),
.
FILE OBJECTS in PYTHON
In Python, a file operation takes place in
the following order:
Open a file
Read or write (perform operation)
Close the file
FILES
Types of Data Files
Text File-A text file can be understood as a sequence of
characters consisting of alphabets, numbers and other
special symbols. Files with extensions like .txt, .py, .csv,
etc. are some examples of text files
Each line of a text file is terminated by a special
character, called the End of Line (EOL). For example,
the default EOL character in Python is the newline (\n).
FILES
Binary File-
Binary files are also stored in terms of
bytes (0s and 1s), but unlike text files,
these bytes do not represent the ASCII
values of characters. Rather, they
represent the actual content such as
image, audio, video, compressed versions
of other files, executable files, etc. These
files are not human readable.
File Builtin Function-open()
The open() built-in function provides a general interface to
initiate the file input/output (I/O) process.
The open() BIF returns a file object on a successful opening
of the file or else results in an error situation.
When a failure occurs, Python generates or raises an
IOError exception.
The basic syntax of the open() built-in function is:
file_object =
open(file_name, access_mode='r', buffering=-1)
The file_name is a string containing the
name of the file to open. It can be a relative
or absolute/full pathname.
The access_mode optional variable is also
a string, consisting of a set of flags
indicating which mode to open the file with.
Generally, files are opened with the modes
'r,' 'w,'or 'a,' representing
read, write, and append, respectively
File Builtin Function-open()
Any file opened with mode 'r‘ must exist. Any
file opened with 'w' will be truncated first if it
exists, and then the file is (re)created.
Any file opened with 'a' will be opened for
append. All writes to files opened with 'a' will be
from end-of-file, even if you seek elsewhere
during access.
If the file does not exist, it will be created,
making it the same as if you opened the file in
'w' mode
File Builtin Function-open()
File Mode Operation
r Open for read
w Open for write
a Open for append (always works from
EOF, create if necessary)
r+ Open for read and write
w+ Open for read and write
a+ Open for read and write
File Builtin Function-open()
Access Modes for File Objects
rb Open for binary read
wb Open for binary write
ab Open for binary append
rb+ Open for binary read and write
wb+ Open for binary read and write
ab+ Open for binary read and write
File Builtin Function-open()
buffering, is used to indicate the type of
buffering that should be performed when
accessing the file.
A value of 0 means no buffering should
occur, A value of 1 signals line buffering,
and any value greater than 1 indicates
buffered I/O with the given value as the
buffer
size.
FileBuilt-inMethods
File methods come in 4 different categories:
• input,
output,
movement within a file( intra-file motion,)
miscellaneous.
FileBuilt-inMethods
Input methods
i) read( )--method is used to read bytes directly
into a string, reading at most the number of bytes
indicated.
syntax: read(size), default size=-1
If no size is given (the default value is set to integer -1) or
size is negative, the file will be read to the end.
FileBuilt-inMethods
2) readline()--The readline() method reads one
line of the open file
reads all bytes until a line-terminating character
like NEWLINE is encountered.
Like read(), there is also an optional size option,
which, if not provided, defaults to -1, meaning
read until the line-ending characters (or EOF)
are found.
Syn: readline(size), default size=-1
FileBuilt-inMethods
3) readlines()-- it reads all lines until EOF and
returns them as a list of strings.
Syn: readlines(sizhint=0),
readable()
Output methods
1) write( )--It takes a string that can consist of one
or more lines of text data or a block of bytes and
writes the data to the file
write( string)
FileBuilt-inMethods
writelines()– It takes a list of strings and
writes them out to a file.
writelines(list)
writable()
Intra-file motion
seek()--
The seek() method moves the file pointer to
different positions within the file.
FileBuilt-inMethods
The seek() method sets the current file position
in a file stream.
syn: seek(offset, whence)
Offset: Number of positions to move forward
whence: It defines point of reference.
.
whence =0 sets the reference point at the beginning of
the file
= 1 sets the reference point at the current file
position
=2 sets the reference point at the end of the file
file.tell() Returns current location within file
The tell() method returns the current file position in a
file stream
FileBuilt-inMethods
Miscellaneous
file.close() Closes file
file.fileno() Returns integer file descriptor
(FD) for file
file.flush() Flushes internal buffer for file
isatty() Returns whether the file stream is
interactive or not
File Built-in Attributes
file.closed ----TRue if file is closed and
False otherwise
file.encoding- Encoding that this file uses
when Unicode strings are written to file, they
will be converted to byte strings using
file.encoding; a value of None indicates
that the system default encoding for
converting Unicode strings should be used
file.mode Access mode with which file was
opened
file.name Name of file
file.newlines. None if no line separators have
been read, a string consisting of one type of
line separator, or a tuple containing all types
of line termination characters read so far
Standard Files
There are generally three standard files
that are made available when program
starts. These are
standard input (usually the keyboard),
standard output (buffered output to the
monitor or display),
and standard error (unbuffered output to
the screen).
Standard Files
stdin, stdout, and stderr
import sys, to access these files as
sys.stdin,
sys.stdout, and
sys.stderr.
The print statement normally outputs
to sys.stdout
while the raw_input() built-in function
receives its input from sys.stdin.
Command Line Arguments
Thesys module also provides access to any
command-line arguments via sys.argv
Command-line arguments are those
arguments given to the program in addition
to the script name on invocation.
The names "argc" and "argv" stand for
"argument count" and "argument vector,"
Command Line Arguments
The argv variable contains an array of strings
consisting of each argument from the
command line
while the argc variable contains the number
of arguments entered.
sys.argv is the list of command-line
arguments
len(sys.argv) is the number of command-line
arguments(aka argc)
File System
Access to the file system occurs mostly through the
Python os module.
osmodule serves as the primary interface to the
operating system facilities and services from Python
In addition to managing processes and the process
execution environment, the os module performs most of
the major file system operations
File System
These features include removing and renaming
files, traversing the directory tree, and managing
file accessibility
OS module File /Directory Access functions
File Processing-Functions
1) mkfifo() / mknod() —
Create named pipe/create filesystem node
File System
remove() / unlink() Delete file
rename() Rename file
stat( ) ----Return file statistics
utime() Update timestamp
tmpfile() Create and open ('w+b') new
temporary file
walk()--Generate filenames in a directory
tree
File System
Directory Functions
chdir() / fchdir()
Change working directory
chroot()
Change root directory of current process
listdir() List files in directory
File System
getcwd() / getcwdu()
Return current working directory/same but in
Unicode
mkdir()/makedirs() Create directory(ies)
rmdir()/removedirs() Remove directory(ies)
File System
Access /Permissions
access() Verify permission modes
chmod() Change permission modes
umask() Set default permission modes
stat.S_IREAD − Read by owner.
stat.S_IWRITE − Write by owner.
stat.S_IEXEC − Execute by owner.
stat.S_IRWXU − Read, write, and execute
by owner.
Stat.S_IRGRP, stat.S_IXGRP
os.path module performs specific pathname
operations.
The module is accessible through the os
module.
Included with this module are functions to
manage and manipulate file pathname
components, obtain file or directory
information, and make file path inquiries
os.path Acess Functions
Separation
basename() Remove directory path and
return leaf name
dirname() Remove leaf name and return
directory path
split() Return (dirname(), basename()) tuple
splitdrive() Return (drivename, pathname)
tuple
splitext() Return (filename, extension) tuple
os.path Acess Functions
Information
getatime() Return last file access time
getctime() Return file creation time
getmtime() Return last file modification time
getsize() Return file size (in bytes)
os.path Acess Functions
Inquiry
exists() Does pathname (file or directory)
exist?
isabs() Is pathname absolute?
isdir() Does pathname exist and is a
directory?
isfile() Does pathname exist and is a file?
File Execution
There are multiple ways in Python to run
other pieces of code outside of the currently
executing program, i.e.,
run an operating system command or another
Python script, or
execute a file on disk or across the network.
File Execution
Some specific execution scenarios could
include:
● Remain executing within our current script
● Create and manage a subprocess
● Execute an external command or program
● Execute a command that requires input
File Execution
Invoke a command across the network
Execute a command creating output that
requires processing
Execute another Python script
Execute a set of dynamically generated
Python statements
Import a Python module (and executing its
top-level code)
File Execution
Some specific execution scenarios could
include:
● Remain executing within our current script
● Create and manage a subprocess
● Execute an external command or program
● Execute a command that requires input
File Execution
Python's execution environment consists of
"callable" objects---functions , methods,
classes, some class instances
code objects--consists of Python
statements, assignments, expressions, and
even modules.
If any Python code is to be executed, that
code must first be converted to byte-
compiled code
File Execution
Executable Object Statements and Built-in
Functions
---compile(),callable(),eval(),exec
Executing other programs—import
Executing other programs (non-python)
import os module
Terminating execution-sys.exit()
Persistent Storage Modules
A way to archive the data so that we may
access them at a later time instead of
having to re-enter all of that information.
When simple disk files are no longer
acceptable and full relational database
management systems (RDBMSs) are
overkill, simple persistent storage fills the
gap.
Persistent Storage Modules
The majority of the persistent storage
modules deals with storing strings of data,
but there are ways to archive Python objects
as well.
Persistent Storage Modules
pickle and marshal Modules
Python provides a variety of modules that
implement minimal persistent storage
One set of modules (marshal and pickle)
allows for pickling of Python objects.
Persistent Storage Modules
pickle and marshal Modules
Pickling is the process whereby objects more
complex than primitive types can be converted to a
binary set of bytes that can be stored or
transmitted across the network, then be converted
back to their original object forms. (shelve)
Persistent Storage Modules
Pickling is also known as flattening, serializing, or
marshalling.
Another set of modules (dbhash/bsddb, dbm,
gdbm, dumbdbm) and their "manager" (anydbm)
can provide persistent storage of Python strings
only.
The shelve module can do both
Persistent Storage Modules
Both marshal and pickle can flatten Python objects.
These modules do not provide "persistent storage“ ,
since they do not provide a namespace for the objects,
nor can they provide concurrent write access to
persistent objects.
What they can do is to pickle Python objects to allow
them to be stored or transmitted.
Storage is sequential in nature
Persistent Storage Modules
The difference between marshal and pickle is that
marshal can handle only simple Python objects
(numbers, sequences, mapping, and code)
while pickle can
transform recursive objects, objects that are multi-
referenced from different places, and user-defined
classes and instances.
Persistent Storage Modules
DBM-style Modules
The *db* series of modules writes data in the
traditional DBM format.
There are a large number of different
implementations:
dbhash/bsddb,
dbm, gdbm,
dumbdbm.
Persistent Storage Modules
DBM-style Modules
The generic anydbm module detects which DBM-
compatible modules are installed on your system
and uses the "best" one .
The dumbdbm module is the most limited one, and
is the default used if none of the other
packages is available.
Persistent Storage Modules
DBM-style Modules
These modules do provide a namespace for our
objects.
The one limitation of these systems is that they can
store only strings.
They do not serialize Python objects
Persistent Storage Modules
Shelve Module
The shelve module uses the anydbm module to find
a suitable DBM module, then uses cPickle to
perform the pickling process.
The shelve module permits concurrent read access
to the database file, but not shared read/write access
Persistent Storage Modules
Pickle module
The pickle module allows you to store
Python objects directly to a file without
having to convert them to strings or to
necessarily write them out as binary files
using low-level file access.
Pickle module
pickle module creates a Python-only
binary version that allows you to cleanly
read and write objects in their entirety
without having all the file details.
A valid file handle, is required to read or
write objects from or to disk.
The two main functions in the pickle module
are dump() and load().
The dump() function takes a file handle and a
data object and saves the object in a format it
understands to the given file.
When a pickled object is loaded from disk
using load() , it restores that object to its
original configuration before it was saved to
disk.
Related Modules
base64 ------Encoding/decoding of binary
strings to/from text strings
binascii -----Encoding/decoding of binary and
ASCII-encoded binary strings
filecmp ----Compares directories and files
fileinput---- Iterates over lines of multiple input
text files
Related Modules
getopt /optparse--- Provides command-line
argument parsing/manipulation
gzip/zlib-- Reads and writes GNU zip (gzip)
files (needs zlib module for compression)
shutil ----Offers high-level file access
functionality
tempfile ---Generates temporary file
names or files
zipfile--- Tools and utilities to read and
write ZIP archive files