0% found this document useful (0 votes)

25 views12 pages

Regular Expressions in Python

The document provides an overview of Regular Expressions (Regex) in Python, explaining their importance in text processing, validation, and data extraction. It details key functions in the re module, such as re.match, re.search, and re.findall, along with the benefits of using re.compile for efficiency. Additionally, it covers common regex patterns, practical applications, and advanced techniques like groups and alternation.

Uploaded by

Praghya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views12 pages

Regular Expressions in Python

Uploaded by

Praghya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Regular Expressions in Python

Introduction
Regular Expressions (Regex) are patterns used to match character combinations in strings.
They are an essential tool for processing text and data across various domains. In web
development, regex is used to validate user inputs like email addresses or passwords. Data
scientists and analysts use regex to clean, transform, and extract meaningful patterns from
raw datasets. Similarly, regex plays a critical role in parsing log files, identifying errors in
large datasets, and extracting specific information from documents or web pages. Its
versatility makes it a foundational skill for software engineers, data professionals, and system
administrators alike. They are widely used for:

• Text validation (e.g., email validation)

• Searching within text
• Text manipulation (e.g., replacing patterns)
• Parsing complex datasets (e.g., logs, HTML, or CSV files)

Python provides the re module to work with regular expressions. This document explains
regex concepts with examples and outputs to help beginners understand and apply regex
effectively.

Basics of Regular Expressions

Raw String Literals in Python ( r prefix)

• Raw strings in Python (e.g., r"\d") treat backslashes literally, simplifying regex
patterns. Without the r prefix, double backslashes are required.
• Example:

import re
pattern = r"\d"
string = "abc123"
result = re.search(pattern, string)
print(result.group()) # Output: 1

Key Functions in the re Module

1. re.match()

Matches a pattern at the start of the string.

import re
result = re.match(r'\d+', '123abc')
if result:
print(result.group()) # Output: 123

2. re.search()

Searches for the first occurrence of a pattern anywhere in the string.

result = re.search(r'\d+', 'abc123def')

if result:
print(result.group()) # Output: 123

3. re.findall()

Returns all occurrences of a pattern as a list.

result = re.findall(r'\d+', 'abc123def456')

print(result) # Output: ['123', '456']

4. re.split()

Splits a string by occurrences of a pattern.

result = re.split(r'\d+', 'abc123def456')

print(result) # Output: ['abc', 'def', '']

5. re.sub()

Replaces occurrences of a pattern with a replacement string.

result = re.sub(r'\s+', '-', 'This is a test')

print(result) # Output: 'This-is-a-test'

6. re.compile()

Creates a reusable regex pattern object for efficiency.

pattern = re.compile(r'\d+')
result = pattern.findall('123abc456')
print(result) # Output: ['123', '456']

What does re.compile do?

When you use re.compile, it "prepares" (or compiles) your regular expression into a reusable
object. This object can then be used multiple times for different operations like finding
matches, replacing text, etc., without needing to re-interpret the pattern every time.
Without re.compile, Python has to process the pattern every time you call a function like
re.findall or re.search. Using re.compile is more efficient if you're working with the same
pattern multiple times in your code.
Key Benefits of re.compile:
1. Improved performance: The pattern is compiled once and reused, saving time if you
use it repeatedly.
2. Better readability: The pattern is defined and reused in a clear, structured way.

Example Without re.compile:

Imagine you need to find and replace all numbers in multiple strings. Without re.compile,
you'll repeatedly pass the pattern to the re functions:
import re

strings = ["abc123", "456def", "ghi789"]

for s in strings:
# Find all numbers in each string
matches = re.findall(r'\d+', s)
print(f"Numbers in '{s}': {matches}")
Output:
Numbers in 'abc123': ['123']
Numbers in '456def': ['456']
Numbers in 'ghi789': ['789']
Here, Python processes the pattern r'\d+' every time you call re.findall.

Example With re.compile:

If you use re.compile, the pattern is prepared once and reused for each string:
import re

# Compile the pattern once

pattern = re.compile(r'\d+')

strings = ["abc123", "456def", "ghi789"]

for s in strings:
# Use the compiled pattern to find numbers
matches = pattern.findall(s)
print(f"Numbers in '{s}': {matches}")
Output:
Numbers in 'abc123': ['123']
Numbers in '456def': ['456']
Numbers in 'ghi789': ['789']

What’s the Difference?

• Without re.compile: The pattern is interpreted every time re.findall is called.
• With re.compile: The pattern is interpreted once and reused for all operations.
If you're working with the pattern only once, re.compile doesn't make a noticeable difference.
However, if the pattern is reused multiple times (e.g., in a loop or across different parts of
your code), using re.compile improves performance and makes your code more readable.

Summary:
• re.compile is useful when you use the same pattern repeatedly.
• It saves time by compiling the pattern once and allows you to use the compiled object
for all regex operations.

Common Regex Patterns and Characters

Table 1: Basic Characters and Character Classes

Character/Pattern Description Example Matches / Fails to Match

Matches any character except a Matches: "acb", "a1b"; Fails:
. "a.b"
newline. "ab", "a\nb"

Matches any alphanumeric Matches: "hello",

\w "\w+" "Python_123"; Fails: "hello!",
character ([a-zA-Z0-9_]).
"123$"
Matches any non-alphanumeric Matches: "!!!", "#@$%";
\w "\w+"
character. Fails: "abc123", "hello"
Matches: "123", "456"; Fails:
\d Matches any digit ([0-9]). "\d{3}"
"12", "abc"
Matches any non-digit Matches: "hello", "abc!";
\d "\d+"
character. Fails: "1234", "567"
Matches any whitespace Matches: " ", "\\t"; Fails:
\s "\s+"
character (space, tab, newline). "abc", "123"
Matches any non-whitespace Matches: "hello123", "abc!";
\S "\S+"
character. Fails: " ", "\n"

Anchors and Special Characters

Character/Pattern Description Example Matches / Fails to Match

Anchors the pattern to the Matches: "hello world"; Fails:
^ "^hello"
start of the string. "world hello", "abc hello"
Anchors the pattern to the Matches: "hello world"; Fails:
$ "world$"
end of the string. "world hello", "hello"
Matches: "cat", "a cat"; Fails:
\b Matches a word boundary. "\bcat\b"
"catalog", "scattered"

Quantifiers

Matches / Fails to
Character/Pattern Description Example
Match
Matches zero or more repetitions Matches: "b", "ab",
* "a*b"
of the preceding element. "aaab"; Fails: "cab", "c"
Matches / Fails to
Character/Pattern Description Example
Match
Matches one or more repetitions of Matches: "ab", "aaab";
+ "a+b"
the preceding element. Fails: "b", "c"
Matches: "aaa"; Fails:
{n} Matches exactly n repetitions. "a{3}"
"aa", "aaaa"
Matches: "aa", "aaa";
{n,} Matches at least n repetitions. "a{2,}"
Fails: "a"
Matches between n and m Matches: "a", "aa", "aaa";
{n,m} "a{1,3}"
repetitions. Fails: "aaaa"

Examples of Practical Applications

Email Validation

Extracting Data

Example 1: Extract Numbers

Example 2: Extract Email Addresses

Example 3: Validate Mobile Numbers

Example 4: Extract Hours from Timestamps

Example 5: Extract Specific Data from Text

Using Regex Flags

Multi-line Matching
Case-Insensitive Matching

Advanced Regex Techniques

Groups and Alternation

Here are explanations and examples for each of the regex components mentioned in the
image:

1. | (Either or):

The pipe | is used to match either of two or more options.

Example:

• Explanation: The pattern matches either "falls" or "stays" in the input text.

2. () (Capture and group):

Parentheses are used to group parts of a pattern and capture them as separate groups.

Example:
• Explanation:
o (rain|sun) captures "rain" or "sun".
o (falls|stays) captures "falls" or "stays".
o The result is a list of tuples containing the captured groups.

3. [] (Set of characters):

Square brackets [] define a set of characters to match.

Example:

• Explanation: The pattern [a-z] matches any lowercase letter from 'a' to 'z'. Each match
is returned as a separate element in the list.

4. \ (Special sequence):

The backslash \ is used to escape special characters or represent special sequences.

Example:
• Explanation: The pattern \d matches any digit (0–9). Here, it finds all the digits in the
input text.

Another Example (Escaping Special Characters):

• Explanation: The backslash \ escapes the special meaning of |, treating it as a literal

character to match.

The .group() method in Python regular expressions is used to extract the part of the string
that matches the pattern or a specific group within the match.

Syntax of .group()

match.group([group_number])

• group_number (optional):
o If not specified (i.e., .group()), it returns the entire match.
o group(0): Returns the entire match (same as .group()).
o group(n): Returns the text matched by the n-th capturing group (inside
parentheses).

Example 1: Using .group() to Return the Entire Match

• Explanation:
o The pattern \d+ matches one or more digits.
o .search() finds the first match ("123").
o .group() returns the entire match.

Example 2: Using .group(n) to Access Capturing Groups

• Explanation:
o The parentheses () create a capturing group for the digits (\d+).
o .group(1) returns the content of the first capturing group (the digits "123").

Example 3: Multiple Capturing Groups

• Explanation:
o (123) is captured in group 1.
o (apples) is captured in group 2.
o .group(0) always returns the full match.

Example 4: Named Groups

You can assign names to groups and access them with .group('name').

Explanation:

o (?P<number>\d+) names the first group "number".

o (?P<item>\w+) names the second group "item".
o .group('name') retrieves the content of the named group.

What Happens If There’s No Match?

If there’s no match, .group() raises an AttributeError. To avoid this, always check if match is
not None before using .group().

Summary:

• .group() returns the entire match.

• .group(n) returns the n-th capturing group.
• Named groups allow you to retrieve specific parts of the match using names.

Chapter 10
No ratings yet
Chapter 10
28 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
14 pages
Python Regex
No ratings yet
Python Regex
8 pages
Python Course: Session 6b - Regular Expressions
No ratings yet
Python Course: Session 6b - Regular Expressions
11 pages
Python Reg Expressions
No ratings yet
Python Reg Expressions
8 pages
Unit - 4 Regex
No ratings yet
Unit - 4 Regex
28 pages
Python Regex: Re - Match, Re - Search, Re - Findall With Example
No ratings yet
Python Regex: Re - Match, Re - Search, Re - Findall With Example
10 pages
Python Regular Expression
100% (1)
Python Regular Expression
31 pages
Python Regular Expressions Quick Reference
No ratings yet
Python Regular Expressions Quick Reference
2 pages
9 RegEx
No ratings yet
9 RegEx
57 pages
Lecture 9 Python
No ratings yet
Lecture 9 Python
8 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
Regular Expression L
No ratings yet
Regular Expression L
20 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
9 RegEx
No ratings yet
9 RegEx
57 pages
Howto Regex
No ratings yet
Howto Regex
19 pages
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
mod-3-PATTERN MATCHING WITH REGULAR EXPRESSIONS
No ratings yet
mod-3-PATTERN MATCHING WITH REGULAR EXPRESSIONS
21 pages
9python Simple Character Matches
No ratings yet
9python Simple Character Matches
19 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Python Regex Cheatsheet With Examples: Re Module Functions
No ratings yet
Python Regex Cheatsheet With Examples: Re Module Functions
1 page
Untitled
No ratings yet
Untitled
53 pages
Regular Expressions Cheat Sheet
No ratings yet
Regular Expressions Cheat Sheet
5 pages
Python 201 - (Slightly) Advanced Python Topics
No ratings yet
Python 201 - (Slightly) Advanced Python Topics
69 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
Regular
No ratings yet
Regular
9 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
RegEx in Python
No ratings yet
RegEx in Python
5 pages
RegEx in Python
No ratings yet
RegEx in Python
6 pages
Regular Expressions
No ratings yet
Regular Expressions
5 pages
Regular Expression Python
No ratings yet
Regular Expression Python
23 pages
Data Analysis Using Python Lab Ex3
No ratings yet
Data Analysis Using Python Lab Ex3
27 pages
Regular Expression
No ratings yet
Regular Expression
20 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
Module5 RegularExpressions
No ratings yet
Module5 RegularExpressions
10 pages
Unit 2
No ratings yet
Unit 2
69 pages
Text Processing For NLP Understanding Regex
No ratings yet
Text Processing For NLP Understanding Regex
16 pages
Howto Regex
No ratings yet
Howto Regex
17 pages
Python RegEx
No ratings yet
Python RegEx
11 pages
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
20 pages
Python Unit 5
No ratings yet
Python Unit 5
143 pages
Manipulating Text With Regular Expression in Python
No ratings yet
Manipulating Text With Regular Expression in Python
4 pages
Unit7 RegularExpressionpdf 2023 10 17 09 16 29
No ratings yet
Unit7 RegularExpressionpdf 2023 10 17 09 16 29
17 pages
Regular Expressions (Slides)
No ratings yet
Regular Expressions (Slides)
20 pages
Python - Slide 5
No ratings yet
Python - Slide 5
42 pages
Python Assignment Date: 08-11-2021: Name-Navjeet Kaur Sap ID-500076160 Roll No - R134219065
No ratings yet
Python Assignment Date: 08-11-2021: Name-Navjeet Kaur Sap ID-500076160 Roll No - R134219065
3 pages
Regular Expressions: Regular Expression Syntax in Python
No ratings yet
Regular Expressions: Regular Expression Syntax in Python
11 pages
Lecture 11 Regular Expressions
No ratings yet
Lecture 11 Regular Expressions
17 pages
Regular Expression
No ratings yet
Regular Expression
21 pages
Full Python Regex Questions Detailed
No ratings yet
Full Python Regex Questions Detailed
4 pages
Module II
No ratings yet
Module II
17 pages
13B RegExp
No ratings yet
13B RegExp
38 pages
Module3 RegularExpressions
No ratings yet
Module3 RegularExpressions
8 pages
Regular Expressions - Regexes in Python (Part 1) - Real Python
No ratings yet
Regular Expressions - Regexes in Python (Part 1) - Real Python
44 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Comprehensive CSS3 Command List, With Descriptions And Typical Mark Up
From Everand
Comprehensive CSS3 Command List, With Descriptions And Typical Mark Up
Online Trainees
No ratings yet
Unit 4 - Strings
No ratings yet
Unit 4 - Strings
26 pages
Unit 3 - Functions
No ratings yet
Unit 3 - Functions
19 pages
Strings and Lists
No ratings yet
Strings and Lists
33 pages
Files in Python
No ratings yet
Files in Python
12 pages
Asthma MX Table
No ratings yet
Asthma MX Table
2 pages
2024 Msce Geography p11 Mwanza Cluster Mock
No ratings yet
2024 Msce Geography p11 Mwanza Cluster Mock
10 pages
DC260-45-55 Ba en 2006-08
No ratings yet
DC260-45-55 Ba en 2006-08
176 pages
Prefabricated Substation Wiring Drawing
No ratings yet
Prefabricated Substation Wiring Drawing
2 pages
My Internship Report of Alfalah
No ratings yet
My Internship Report of Alfalah
81 pages
Summer Internship Report
No ratings yet
Summer Internship Report
32 pages
11 Creative Coca Cola Advertising Examples and Popular Campaigns - Marketing91
No ratings yet
11 Creative Coca Cola Advertising Examples and Popular Campaigns - Marketing91
13 pages
CJ20N Change Create Additional WBS-Hierarchy Graphics
No ratings yet
CJ20N Change Create Additional WBS-Hierarchy Graphics
9 pages
Artificial Intelligence and The Clinical Lab
No ratings yet
Artificial Intelligence and The Clinical Lab
36 pages
University Day 2018
No ratings yet
University Day 2018
16 pages
Tw-800r-Exc Manual Eng
No ratings yet
Tw-800r-Exc Manual Eng
88 pages
Top 10 Investment Scams
No ratings yet
Top 10 Investment Scams
30 pages
52 - ApprovedTrussSystem List Ogos 2022
No ratings yet
52 - ApprovedTrussSystem List Ogos 2022
18 pages
Eee 1102
No ratings yet
Eee 1102
83 pages
Cabrera V CA
No ratings yet
Cabrera V CA
2 pages
Threat Modeling Activity Handout
100% (1)
Threat Modeling Activity Handout
4 pages
d7600022lf FTM-3012C-SLxG
No ratings yet
d7600022lf FTM-3012C-SLxG
10 pages
FreePress 06-13-14
No ratings yet
FreePress 06-13-14
24 pages
2-REPONSE API 650 Question N°02 2022 DETAILS
No ratings yet
2-REPONSE API 650 Question N°02 2022 DETAILS
5 pages
Sciencedirect: Survey On Anomaly Detection Using Data Mining Techniques
No ratings yet
Sciencedirect: Survey On Anomaly Detection Using Data Mining Techniques
6 pages
Raspberry Pi: Led Blinking
No ratings yet
Raspberry Pi: Led Blinking
38 pages
GIMC 2024 - Moot Problem
No ratings yet
GIMC 2024 - Moot Problem
22 pages
100 Niche For Ebay
No ratings yet
100 Niche For Ebay
24 pages
IPL Pro Series FW3x16x00x13 PDF
No ratings yet
IPL Pro Series FW3x16x00x13 PDF
43 pages
Agriculture 12 01372 v2
No ratings yet
Agriculture 12 01372 v2
20 pages
Citrus Aurantifolia 15 ML: Product Description
No ratings yet
Citrus Aurantifolia 15 ML: Product Description
1 page
Curriculum Evaluation
No ratings yet
Curriculum Evaluation
26 pages
Unfair Rulebook
No ratings yet
Unfair Rulebook
20 pages
Jurnal MOOC
No ratings yet
Jurnal MOOC
10 pages
24,57
100% (1)
24,57
2 pages

Regular Expressions in Python

Uploaded by

Regular Expressions in Python

Uploaded by

Regular Expressions in Python

• Text validation (e.g., email validation)

Basics of Regular Expressions

Key Functions in the re Module

Matches a pattern at the start of the string.

Searches for the first occurrence of a pattern anywhere in the string.

result = re.search(r'\d+', 'abc123def')

Returns all occurrences of a pattern as a list.

result = re.findall(r'\d+', 'abc123def456')

Splits a string by occurrences of a pattern.

result = re.split(r'\d+', 'abc123def456')

Replaces occurrences of a pattern with a replacement string.

result = re.sub(r'\s+', '-', 'This is a test')

Creates a reusable regex pattern object for efficiency.

What does re.compile do?

Example Without re.compile:

strings = ["abc123", "456def", "ghi789"]

Example With re.compile:

# Compile the pattern once

strings = ["abc123", "456def", "ghi789"]

What’s the Difference?

Common Regex Patterns and Characters

Character/Pattern Description Example Matches / Fails to Match

Matches any alphanumeric Matches: "hello",

Anchors and Special Characters

Character/Pattern Description Example Matches / Fails to Match

Examples of Practical Applications

Example 1: Extract Numbers

Example 2: Extract Email Addresses

Example 3: Validate Mobile Numbers

Example 5: Extract Specific Data from Text

Using Regex Flags

Advanced Regex Techniques

The pipe | is used to match either of two or more options.

2. () (Capture and group):

Square brackets [] define a set of characters to match.

The backslash \ is used to escape special characters or represent special sequences.

Another Example (Escaping Special Characters):

• Explanation: The backslash \ escapes the special meaning of |, treating it as a literal

Example 1: Using .group() to Return the Entire Match

Example 2: Using .group(n) to Access Capturing Groups

Example 3: Multiple Capturing Groups

Example 4: Named Groups

o (?P<number>\d+) names the first group "number".

What Happens If There’s No Match?

• .group() returns the entire match.

You might also like