0% found this document useful (0 votes)
26 views17 pages

UNIT-03 FILES

The document provides an overview of file handling in Python, detailing how to open, read, write, and close files, as well as the importance of file persistence. It introduces the 'pickle' module for serializing Python objects and explains basic file operations such as renaming and deleting files using the 'os' module. Additionally, it covers error handling when dealing with files and demonstrates reading from URLs using the 'urllib' module.

Uploaded by

Gavi Kiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views17 pages

UNIT-03 FILES

The document provides an overview of file handling in Python, detailing how to open, read, write, and close files, as well as the importance of file persistence. It introduces the 'pickle' module for serializing Python objects and explains basic file operations such as renaming and deleting files using the 'os' module. Additionally, it covers error handling when dealing with files and demonstrates reading from URLs using the 'urllib' module.

Uploaded by

Gavi Kiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

FILES

UNIT 03
 File is simply a sequence of characters stored on your computer or
network.
 One of the things that makes a file different from a string or list of
characters is that the file exists even after a program ends.
 This makes a file useful for maintaining information that must be
Files remembered for a long period of time
 Within a Python program a file is represented by a value of type file.
 This value does not actually hold the contents of the file, rather the
value is a portal through which the user can access the contents of
the file
A file value is used in three distinct steps.

• First, the file is opened. This establishes the


link between the file value in the Python
program and the information stored on the
disk.
• Next, values are read or written to the file.
The process involves bringing characters in
from the disk and storing them in a string in
the Python program, or alternatively taking
the contents of a string in your program and
writing them out to the disk.
• When all values have been either read or
written, the last step is to close the file.
Python String rstrip() Method The rstrip() method removes any trailing characters
(characters at the end a string), space is the default
trailing character to remove.

• Remove any white spaces at the end of the string:

txt = " Orange "


of all fruits Orange is my favorite
x = txt.rstrip()

print("of all fruits", x, "is my favorite")

• Remove the trailing characters if they are commas, periods,


s, q, or w:

txt = “Orange,,,,,ssqqqww....."

x = txt.rstrip(",.qsw") Orange

print(x)
Warning! Opening a file for write removes old values

Copy the four lines given earlier into a file named peas2.txt. Then try executing the
following two lines:

>>> f = open(“peas2.txt”, “w”)


>>> f.close()

Open the file with notepad or some other word processor. What has happened? Remember, opening a
file for writing causes the old values in the file to be deleted. This is true even if no new values are
written into the file.
def freqCount (f): # f is a file of input
freq = { }
line = f.readline()
while line:
words = line.split()
Rewriting for word in words:
freq[word] = freq.get(word, 0) + 1
Word Count line = f.readline()
return freq
Program def main ():
f = open(“text.txt”)
freq = freqCount(f)
# now all words have been read
for word in freq:
print word + ‘ occurs ‘+ freq[word] + ‘ times’
 The operating system (such as Windows, Mac, or Unix) is normally
in charge of the management of files.
 There are a number of useful operating system commands that can
be executed from within a Python program by including the os
module.
Operating  The two most useful commands are os.remove(name), which
deletes (removes) the named file, and os.rename(oldname,
System newname), which renames a file.
Commands
>>> import os
>>> os.remove(“gone.txt”) # delete file named gone
>>> os.rename(“fred.txt”, “alice.txt”) # fred becomes alice
 A file value can be used in a for statement. The resulting loop reads
from the file line by line, and assigns the line to the for variable

f = open(“peas.txt”)
for line in f:
print line.reverse()

 This can often make programs that manipulate files considerably


sorter than the equivalent form using a while statement. For
Files and for example, our frequency counting function is reduced to the
following

def freqCount (f): # f is a file of input


freq = { }
for line in f:
words = line.split()
for word in words:
freq[word] = freq.get(word, 0) + 1
return freq
 What happens if you try to open a file that does not exist? As you
might expect, the Python system complains, and responds by
throwing an exception, an IOError.
 Normally the exception causes execution to halt with an error
message
Recovering
from try:
f = open(“input.txt”)
Exceptions except IOError, e:
print ‘unable to open the file input.txt’
else:
… # do something with file f
f.close()
 The print statement, and the functions raw_input and input, are
actually special uses of more general file commands.
 The print statement writes characters to a file that is normally
attached to a display window, while the input functions read from a
file that is attached to the user keyboard.

def main():
Standard I/O # invoke frequency program, reading from
console input
freq = freqCount(sys.stdin)
# now all words have been read
for word in freq:
print word + ‘ occurs ‘+ freq[word] + ‘ times’
• A more subtle use of the system • There are several other functions and variables defined in the
module is to change these variables, sys module. The function sys.exit(“message”) can be used to
thereby altering the effect of the terminate a running Python program.
standard functions. • The function sys.argv is a list of the command line-options
• To see an example, by executing the passed to a program.
following program, and then • On systems that support command line arguments these are
examining the files output.txt and often used to pass information, such as file names, into a
error.txt. program. Assume that echo.py is the following simple
program:

import sys
sys.stdout = open(‘output.txt, ‘w’) import sys
sys.stderr = open(‘error.txt’, ‘w’) print sys.argv
print “see where this goes” The following might be an example execution:
print 5/4 $ python echo.py abc def
print 7.0/0 ['echo.py', 'abc', 'def']
sys.stdout.close()
sys.stderr.close()
Persistence and Pickle

• There is an alternative module that is also useful in saving Later, perhaps in a different program or at a
and restoring the values of Python variables. different time, the contents of the variable
• This module is, somewhat humorously, known as pickle. can be retrieved from the file as follows:
(When you pickle a fruit or vegetable you are saving it for
long term storage).
import pickle
• A more common name for pickling is serialization.

f = open(filename, ‘w’)
1. The pickle module supplies two functions, dump and load. object = pickle.load(f)
These can be used to save the contents of most Python
variables to a file and later restore their values.
2. The following is an example
Multiple objects can be saved and restored in the
same file. However the user is responsible for
import pickle remembering the order that values were saved.
… Most Python objects can be saved and restored
object = ... # create some Python value using pickle and/or shelve.
f = open(filename, ‘w’)
pickle.dump(f, object)
 Consider the main program. Let us assume that the input is
contained in the file input.txt, and the output should go into file
output.txt.
 At a high level, we can describe the algorithm as follows:
import os
# step 1: make all the temporary files
Example – File try
fin = open(“input.txt”)
Sort except IOERROR:
print ‘unable to open input.txt’
else:
tlist = makeTempFiles(fin)
# step 2: merge temp files
while len(tlist) > 1:
mergeTwoIntoOne(tlist)
# step 3: rename the remaining temp file
tname = tlist.pop()
os.rename(tname, “output.txt”)
def makeTempFiles (fin)
# read from fin and break into temp files
tnames = [ ] # make empty list of temp files
done = False def mergeTwoIntoOne (tlist):
while not done: ta = tlist.pop(0) # first file name
tn = makeTempFileName() tb = tlist.pop(0) #second file name
tnames.append(tn) tn = makeTempFileName() # make output file
fn = open(tn, “w”) name
lines = [ ] tlist.append(tn)
I=0 fa = open(ta)
while not done and I < 100: fb = open(tb)
I=I+1 fn = open(tn, “w”)
line = fin.readline() mergeFiles(fa, fb, fn)
if line: fa.close()
lines.append(line) fb.close()
else: os.remove(ta) # remove temp files
done = True os.remove(tb)
lines.sort() # sort the last 100 lines read fn.close()
fn.writelines(lines)
fn.close()
return tnames
def mergeFiles (fa, fb, fn):
# merge the contents of fa and fb into fn
# step 1, mege as lone as both files have lines
linea = fa.readline()
lineb = fb.readline()
while linea and lineb: • We have started from a high level description of the original
if linea < lineb: problem, reduced each task to smaller problems, and then
fn.write(linea) repeatedly addressed each of the smaller problem until
linea = fa.readline() everything is reduced to simple Python statements.
else: • All that is left is putting together the pieces, and verifying
fn.write(lineb) that it works as it should.
lineb = fb.readline()
# step 2 – write remaining lines
# only one of the following will do anything
while linea:
fn.write(linea)
linea = fa.readline()
while lineb:
fn.write(lineb)
lineb = fb.readline()
 The urllib module provides a simple way to read the contents of a
file stored at a specific URL.
 It returns an object that uses the same interface as a file.

import urllib
remotefile =
Reading from urrlib.urlopen(“http://www.python.org”)
line = remotefile.readline()
a URL while line:
print line
line = remotefile.readline()

 The urllib effectively hides all the details of network access,


allowing the programmer to just think about what they want to do
with all that data.

You might also like