0% found this document useful (0 votes)
24 views11 pages

13 BSC CS - Python - Chapter 5

The document provides an overview of file handling in Python, including how to open, read, write, and close files using various modes such as 'r', 'w', and 'a'. It also covers the use of the 'with' statement for automatic file management, as well as handling binary files and using the pickle module for serialization. Additionally, it discusses working with directories and zipping/unzipping files using the zipfile module.

Uploaded by

HARSH MISHRA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

13 BSC CS - Python - Chapter 5

The document provides an overview of file handling in Python, including how to open, read, write, and close files using various modes such as 'r', 'w', and 'a'. It also covers the use of the 'with' statement for automatic file management, as well as handling binary files and using the pickle module for serialization. Additionally, it discusses working with directories and zipping/unzipping files using the zipfile module.

Uploaded by

HARSH MISHRA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

13 B.Sc. CS Sem - I ​ ​ 5.

Files and Regular Expressions

file.close()
Opening a File

In Python, files are opened using the open() Using with Statement for File Management
function, which returns a file object.
Instead of manually closing files, Python
When opening a file, you specify the file path offers the with statement for automatic file
and the mode in which you want to open it handling.
(e.g., read, write, append).
Using with, the file is automatically closed
The most commonly used modes include: once the code block inside with is executed.

E
●​ 'r': Read (default) – Opens the file for Example Code:
reading; file must exist.

EG
●​ 'w': Write – Opens the file for writing; with open("example.txt", "r") as
creates a new file if it doesn’t exist or
file:​
truncates it if it does.

LL
●​ 'a': Append – Opens the file for content = file.read() #
appending data; creates a new file if it Perform file operations here​
doesn’t exist.

O
# No need to close the file
●​ 'r+': Read and Write – Opens the file manually; it's done

C
for both reading and writing; file must automatically
exist.
EE
Syntax: Examples of Opening and Closing Files
R

file_object = open("file_path", Example 1: Opening a File in Write Mode and


Writing Data
EG

"mode")

# Open a file in write mode​


Example Code: file = open("output.txt", "w")​
.D

file.write("Hello, World!") #
# Opening a file named Writing to the file​
.M

"example.txt" in read mode​ file.close() #


file = open("example.txt", "r") Closing the file
.P

Closing a File
IG

Example 2: Appending Data to an Existing


File
After performing file operations, it’s essential
to close the file to free up resources and ensure
R

all changes are saved. # Open a file in append mode​


SH

with open("output.txt", "a") as


This is done using the .close() method on the file:​
file object. file.write("\nWelcome to
Syntax: Python!") # Appending data

file_object.close() Example 3: Reading from a File and Closing


It Manually

Example Code:
# Open a file in read mode​
file = open("output.txt", "r")​
# Closing the file after opening
content = file.read() #
it​
Reading the content​

1
print(content)​ # Reading from a text file​
file.close() # with open("example.txt", "r") as
Closing the file file:​
print(file.read())
Example 4: Using with to Read a File
Automatically Writing to a Text File:

with open("output.txt", "r") as When writing to a file using 'w' mode, the file
file:​ is created if it doesn’t exist or truncated if it
does.
for line in file:​
print(line.strip()) # Use 'a' mode to append without overwriting.
Printing each line

E
Example Code:

EG
Example 5: Handling File Not Found Error
# Writing to a text file​
with open("example.txt", "w") as

LL
try:​
with file:​
open("non_existent_file.txt", file.write("Hello, Python!")

O
"r") as file:​

C
print(file.read())​ Appending to a Text File:
except FileNotFoundError:​
EE
print("File not found! # Appending data to a text file​
Please check the file path.") with open("example.txt", "a") as
R

file:​
Working with Text Files file.write("\nAppending a
EG

new line.")
Text files store data in human-readable format
and are typically used for storing plain text.
.D

Working with Binary Files


In Python, you can work with text files using Binary files store data in binary format,
modes such as 'r' for reading, 'w' for writing,
.M

typically for non-text data like images, audio


and 'a' for appending. files, videos, and serialized data.
.P

Content is usually accessed or manipulated as Binary files are accessed using modes such as
strings.
IG

'rb' (read binary) and 'wb' (write binary).


Reading from a Text File: Reading from a Binary File:
R

●​ .read(): Reads the entire content of the Syntax:


SH

file.
●​ .readline(): Reads one line at a time.
●​ .readlines(): Reads all lines and with open("image.png", "rb") as
returns them as a list. file:​
binary_data = file.read()
Syntax:
Example Code:
with open("example.txt", "r") as
file:​
# Reading binary data​
content = file.read() #
with open("binary_file.dat",
Reads all content at once
"rb") as file:​
data = file.read()​
Example Code:
2
print(data) as file:​
binary_content =
Writing to a Binary File: file.read(20) # Read the first
20 bytes​
Syntax: print(binary_content)

with open("output.dat", "wb") as Example 4: Writing Binary Data to a File


file:​
file.write(binary_data) data = b"Some binary data to
write."​
Example Code: with open("binary_output.dat",

E
"wb") as file:​
# Writing binary data​ file.write(data)

EG
with open("binary_file.dat",
"wb") as file:​ Example 5: Copying a Binary File

LL
data = b"This is binary
data"​ # Copy a binary file​
file.write(data)

O
with open("source_image.jpg",
"rb") as src_file:​

C
Examples of Working with Text and Binary with open("copy_image.jpg",
Files "wb") as dest_file:​
EE
Example 1: Reading and Printing Each Line
in a Text File dest_file.write(src_file.read())
R
EG

with open("example.txt", "r") as Pickle in Python


file:​
The pickle module in Python is used to
for line in file:​
serialize and deserialize Python objects.
.D

print(line.strip()) #
Print each line without extra Serialization (also known as "pickling") is the
.M

newline process of converting a Python object into a


byte stream, which can then be saved to a file
or transferred over a network.
.P

Example 2: Writing and Reading from a Text


File Deserialization (or "unpickling") is the reverse
IG

process, where the byte stream is converted


# Writing to file​ back into a Python object.
R

with open("output.txt", "w") as


file:​ pickle is often used to save data structures,
SH

machine learning models, or any Python


file.write("Python file
objects to files.
handling is easy.\n")​
​ Basic Methods in Pickle:
# Reading from file​
with open("output.txt", "r") as ●​ pickle.dump(obj, file): Serializes and
writes the object obj to a file.
file:​
●​ pickle.load(file): Reads the serialized
print(file.read()) object from the file and deserializes it
back into a Python object.
Example 3: Reading Binary Data from a File
Example Code:
with open("example.png", "rb")

3
import pickle​ This is useful for tracking where the pointer is
in the file and understanding where data is
# Pickling a Python dictionary​ being read or written.
data = {"name": "Alice", "age":
25, "city": "New York"}​ Example Code:
with open("data.pkl", "wb") as
file:​ with open("example.txt", "rb")
pickle.dump(data, file) # as file:​
Serializing data​ file.seek(10) #
# Unpickling the data​ Move pointer 10 bytes from the
with open("data.pkl", "rb") as start​
file:​ print(file.tell()) #
loaded_data = Output: 10 (current position)​

E
pickle.load(file) # data = file.read(5) #

EG
Deserializing data​ Read the next 5 bytes​
print(loaded_data) print(file.tell()) #

LL
Output: 15 (new position after
seek() Method reading)

O
The seek() method allows you to move the file
Examples of Pickle, seek(), and tell()
pointer to a specific position within the file.

C
Example 1: Pickling and Unpickling a List
This is useful when you want to navigate to a
EE
particular part of a file to read or write from
that point onward. import pickle​
# List to be pickled​
R

file.seek(offset, from_what) numbers = [1, 2, 3, 4, 5]​


EG

# Pickling​
offset: Number of bytes to move the pointer. with open("numbers.pkl", "wb")
as file:​
.D

from_what: Reference position (optional, pickle.dump(numbers, file)​


default is 0). # Unpickling​
.M

with open("numbers.pkl", "rb")


■​ 0: Start of the file.
as file:​
■​ 1: Current file position.
.P

■​ 2: End of the file. loaded_numbers =


pickle.load(file)​
IG

Example Code: print(loaded_numbers)


R

with open("example.txt", "rb") Example 2: Moving to a Specific Position in a


SH

as file:​ Text File Using seek()


file.seek(5) # Move pointer
5 bytes from the start​ with open("sample.txt", "r") as
print(file.read()) # Read file:​
from the new position file.seek(7) #
Move to the 7th character​
tell() Method print(file.read(5)) #
Read 5 characters from the new
The tell() method returns the current position position
of the file pointer within the file, measured in
bytes from the start of the file.
Example 3: Using tell() to Track the Position
in a File

4
with open("sample.txt", "r") as Random access allows you to read from or
write to any part of a binary file without
file:​ having to read it sequentially from the
print("Initial position:", beginning.
file.tell())​
file.read(10)​ This is achieved using the seek() method,
print("Position after which moves the file pointer to the specified
location.
reading 10 characters:",
file.tell()) It is useful when working with large files, as
you can directly access specific data without
Example 4: Reading File in Reverse Order reading the entire file.
Using seek()
Example Code:

E
with open("sample.txt", "rb") as

EG
# Writing to a binary file for
file:​
demonstration​
file.seek(0, 2) #
with open("example.bin", "wb")

LL
Move to the end of the file​
as file:​
position = file.tell() #
Get end position​

O
file.write(b"1234567890ABCDEF")​
while position > 0:​
# Random access: Reading

C
position -= 1​
specific parts of the file​
file.seek(position)​
EE
with open("example.bin", "rb")
as file:​
print(file.read(1).decode(),
file.seek(4) #
R

end="")
Move to the 5th byte​
EG

print(file.read(2)) #
Example 5: Using Pickle to Save and Load a
Output: b'56'​
Complex Object
file.seek(-6, 2) #
.D

Move to the 6th byte from the


import pickle​
end​
class Person:​
.M

print(file.read(4)) #
def __init__(self, name,
Output: b'CDEF'
age):​
.P

self.name = name​
Zipping and Unzipping Files
IG

self.age = age​
person = Person("Alice", 30)​
Python’s zipfile module allows for
# Saving the person object​ compressing and decompressing files.
R

with open("person.pkl", "wb") as


SH

file:​ This is useful for reducing file size or grouping


pickle.dump(person, file)​ multiple files into a single archive.
# Loading the person object​
●​ Creating a Zip Archive: Use the
with open("person.pkl", "rb") as ZipFile class in write mode ('w') to
file:​ add files to a new zip archive.
loaded_person = ●​ Extracting from a Zip Archive: Use
pickle.load(file)​ the ZipFile class in read mode ('r') to
print(f"Name: extract files.
{loaded_person.name}, Age:
Example Code:
{loaded_person.age}")
import zipfile​
Random Access of Binary Files
5
# Zipping files​ ​
with # Removing a directory​
zipfile.ZipFile("archive.zip", shutil.rmtree("renamed_directory
"w") as zipf:​ ")
zipf.write("example.bin")
# Add binary file to the zip Example 1: Reading from and Writing to
archive​ Specific Parts of a Binary File
zipf.write("sample.txt")
# Add a text file to the zip with open("example.bin", "wb")
archive​ as file:​
# Unzipping files​ file.write(b"HelloWorld")​
with ​

E
zipfile.ZipFile("archive.zip", with open("example.bin", "rb+")

EG
"r") as zipf:​ as file:​
file.seek(5) # Move

LL
zipf.extractall("extracted_files to the 6th byte​
") # Extract all files into a file.write(b"Python") #
directory Overwrite from the 6th byte​

O
file.seek(0)​

C
Working with Directories print(file.read()) #
Output: b'HelloPython'
EE
The os module in Python provides methods for
interacting with directories, including creating,
renaming, listing contents, and deleting Example 2: Adding Multiple Files to a Zip
Archive
R

directories.
EG

●​ os.mkdir(): Creates a new directory. import zipfile​


●​ os.listdir(): Lists all files and with zipfile.ZipFile("data.zip",
directories in a specified directory.
.D

"w") as zipf:​
●​ os.rename(): Renames a file or
files_to_zip =
directory.
●​ os.rmdir(): Removes an empty ["example.bin", "sample.txt"]​
.M

directory. for file in files_to_zip:​


●​ shutil.rmtree(): Removes a directory zipf.write(file)
.P

and its contents.


IG

Example Code: Example 3: Extracting Specific Files from a


Zip Archive
R

import os​
with zipfile.ZipFile("data.zip",
import shutil​
SH

"r") as zipf:​

zipf.extract("sample.txt",
# Creating a new directory​
"extracted_specific") #
os.mkdir("my_directory")​
Extracts only sample.txt
# Listing contents of a
directory​
print("Contents of current Example 4: Creating a Nested Directory
Structure
directory:", os.listdir("."))​

import os​
# Renaming the directory​
os.makedirs("parent_dir/child_di
os.rename("my_directory",
r", exist_ok=True)​
"renamed_directory")​
print("Directories created:",

6
os.listdir("parent_dir")) Example Code:

import re​
Example 5: Copying and Removing a
Directory text = "Welcome to Python
programming!"​
# Searching for the word
import shutil​
"Python"​
# Copy directory​
match = re.search("Python",
shutil.copytree("parent_dir",
text)​
"backup_dir")​
if match:​
# Remove directory​
print("Pattern found:",
shutil.rmtree("parent_dir")
match.group())​

E
else:​
Introduction to Regular Expressions

EG
print("Pattern not found.")
Regular Expressions (RegEx) are sequences of
characters that form a search pattern.

LL
They are widely used for string matching and Sequence Characters in Regular
manipulation, such as finding, replacing, or

O
Expressions
validating specific patterns within text.

C
Sequence characters (also known as
Python provides the re module to work with metacharacters) allow you to specify patterns
regular expressions.
EE
in regular expressions.

●​ Basic Functions in re Module: Each character or symbol has a special


R

○​ re.search(): Searches for a pattern within a meaning and can be used to create complex
string and returns the first match. search patterns.
EG

○​ re.match(): Checks if a pattern matches at


the beginning of a string.
○​ re.findall(): Finds all instances of a
.D

pattern in a string.
○​ re.sub(): Replaces occurrences of a pattern
with a specified string.
.M

Common Sequence Characters:


.P
IG

Character Description Example


R
SH

. Matches any character except newline a.b matches acb, a b

^ Start of a string ^Hello matches strings that start with "Hello"

$ End of a string end$ matches strings that end with "end"

* Zero or more occurrences ab* matches a, ab, abb, etc.

7
+ One or more occurrences ab+ matches ab, abb, etc., but not a

? Zero or one occurrence ab? matches a or ab

{m,n} Matches between m and n repetitions a{2,4} matches aa, aaa, or aaaa

E
Example Code: print("End match:",

EG
match.group() if match else "No
import re​ match")​
text = "Hello, world! Hello

LL
# Finding all occurrences of
again!"​ 'Hello' in the text​
# Matching words that start with matches = re.findall(r"Hello",

O
'H' at the beginning of the text)​

C
string​ print("All matches:", matches)
match = re.search(r"^Hello",
EE
text)​
Special Characters in Regular Expressions
print("Start match:",
match.group() if match else "No Special characters, also called metacharacters,
R

match")​ are symbols with unique meanings that allow


EG

# Matching any word ending in us to build complex search patterns.


'again' at the end of the string​
These characters enable precise control over
match = re.search(r"again!$",
.D

pattern matching.
text)​
.M
.P

Common Special Characters in Regular Expressions:


IG

Character Description Example


R
SH

\d Matches any digit (0-9) \d{3} matches "123" in "abc123"

\D Matches any non-digit \D{2} matches "ab" in "ab123"

\w Matches any alphanumeric character (a-z, \w{4} matches "word" in "word123"


A-Z, 0-9)

8
\W Matches any non-alphanumeric character \W+ matches "!@#" in "Hello!@#"

\s Matches any whitespace character (space, \s+ matches " " in "Hello world"
tab)

\S Matches any non-whitespace character \S+ matches "Hello" in "Hello world"

E
[] Matches any one character within the [aeiou] matches vowels in "hello"
brackets

EG
LL
[^ ] Matches any one character not in brackets [^aeiou] matches consonants in "hello"

O
() Groups a pattern together (abc)+ matches "abcabc" in "abcabcxyz"

C
EE
` ` Acts as an OR operator
R
EG

\b Matches word boundaries \bcat\b matches "cat" in "cat dog"


.D
.M

Example Code: text)​


print("Uppercase letters:",
.P

import re​ letters) # Output: ['H', 'W',


text = "Hello World! Number:
IG

'N']
123, Another number: 456."​

Using Regular Expressions on Files
R

# Using \d to match digits​


digits = re.findall(r"\d+",
SH

Regular expressions can be extremely useful


text)​ for extracting and manipulating data within
print("Digits found:", digits) text files.
# Output: ['123', '456']​
You can use Python’s file handling with
# Using \b to find whole words​
regular expressions to search, match, and
whole_words = replace patterns in large files.
re.findall(r"\bNumber\b", text)​
print("Whole word 'Number' Steps:
found:", whole_words) # Output:
1.​ Open the File: Use open() to read the
['Number']​
file.
# Using character classes​ 2.​ Read Line-by-Line or Whole
letters = re.findall(r"[A-Z]", Content: You can read each line
individually or the entire content.
9
3.​ Apply Regular Expressions: Use content = file.read()​
functions like re.search(), re.findall(),
updated_content =
re.sub() on the file content.
4.​ Save Results (if needed): Write the re.sub(r"\b\d{2}/\d{2}/\d{4}\b",
modified content back to the file or to "DATE", content)​
a new file. # Save to new file​
with open("updated_events.txt",
Example Code:
"w") as file:​
file.write(updated_content)
import re​
# Open a text file for reading​
with open("sample.txt", "r") as Retrieving Information from an HTML File
Using Regular Expressions
file:​

E
content = file.read()​ HTML files often contain structured data,

EG
# Find all email addresses in which you may want to extract, such as text
the file​ within specific tags (e.g., <title>, <a>, <p>
emails = tags).

LL
re.findall(r"[a-zA-Z0-9._%+-]+@[
While Python offers libraries like
a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", BeautifulSoup for HTML parsing, you can

O
content)​ also use regular expressions for simpler cases.
print("Emails found:", emails)​

C
# Replace phone numbers with a Regular expressions allow you to match
patterns within HTML files, like retrieving
EE
placeholder​
content within tags or finding specific
updated_content =
attributes.
re.sub(r"\d{3}-\d{3}-\d{4}",
R

"[PHONE]", content)​ Steps:


EG

# Write the modified content


back to a new file​ 1.​ Open and Read the HTML File:
Load the HTML content using
with open("updated_sample.txt",
Python’s file handling.
.D

"w") as file:​ 2.​ Define Patterns: Write regex patterns


file.write(updated_content) to match specific HTML tags or
.M

attributes.
Example 1: Extracting Hashtags from a File 3.​ Extract Information: Use functions
like re.findall() to capture desired data.
.P

# Read file and extract hashtags​ Example Code:


IG

with open("social_posts.txt",
import re​
"r") as file:​
# Sample HTML content for
R

posts = file.read()​
demonstration​
SH

hashtags = re.findall(r"#\w+",
html_content = """​
posts)​
<html>​
print("Hashtags found:",
<head><title>Sample
hashtags)
Page</title></head>​
<body>​
Example 2: Replacing Dates in a File <h1>Welcome to My
Website</h1>​
# Replace all dates (format: <p>This is a <a
DD/MM/YYYY) with "DATE" href="https://example.com">link<
placeholder​ /a> to an example site.</p>​
with open("events.txt", "r") as <p>Contact us at <a
file:​ href="mailto:contact@example.com
10
">contact@example.com</a>.</p>​
</body>​
</html>​
"""​
# Extract title content​
title =
re.findall(r"<title>(.*?)</title
>", html_content)​
print("Title:", title[0] if
title else "No title found")​
# Extract all hyperlinks​
links =

E
re.findall(r'href="(http.*?)"',

EG
html_content)​
print("Links:", links)​

LL
# Extract all email addresses​
emails =
re.findall(r'href="mailto:(.*?)"

O
', html_content)​

C
print("Emails:", emails)
EE
Explanation of Key Patterns:

●​ <title>(.*?)</title>: Matches content


R

within <title> tags. .*? is a non-greedy


EG

match for any character between the


tags.
●​ href="(http.*?)": Matches URLs
within href attributes that start with
.D

"http".
●​ href="mailto:(.*?)": Matches email
.M

addresses in mailto links.


.P

Examples:

Example 1: Extracting All Paragraph Text


IG

paragraphs =
R

re.findall(r"<p>(.*?)</p>",
SH

html_content)​
print("Paragraphs:", paragraphs)

Example 2: Retrieving Text from <h1> Tags

heading =
re.findall(r"<h1>(.*?)</h1>",
html_content)​
print("H1 Tag Content:",
heading)

11

You might also like