0% found this document useful (0 votes)

38 views6 pages

RegEx in Python

Uploaded by

Yash Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views6 pages

RegEx in Python

Uploaded by

Yash Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

REGULAR EXPRESSIONS (REGEX) IN PYTHON:

Regular Expressions (RegEx) are a powerful tool for pattern matching and text manipulation. In Python, regex
functionality is implemented through the re module.

APPLICATIONS OF REGEX
● Data validation
● Data extraction
● Input sanitization (data cleaning)

This document explains regex basics, syntax, functions, and practical examples with improved clarity and structure.

What is a Regular Expression?

A Regular Expression is a sequence of characters that defines a search pattern. It can be used to match strings,
validate formats, or extract information.

COMMON USE CASES OF REGEX THAT ARE ALSO COVERED IN THIS ARTICLE WITH DETAILED EXPLANATION:

● Extracting email addresses

● Extracting timestamps from logs
● Extracting URLs
● Validating phone numbers or dates
● Searching for words or patterns in text
● Validating passwords

Regex Syntax in Python

To use regex, you define a pattern or a regex expression that consists of special characters and sequences, which
defines what to look for in a text.
Here are some of the most common components of regex syntax:

1. SPECIAL CHARACTERS
Character Description
. Matches any single character.
^ Matches the start of the string.
$ Matches the end of the string.
* Matches 0 or more repetitions.
+ Matches 1 or more repetitions.
? Matches 0 or 1 occurrence.
{n} Matches exactly n occurrences.
{n,} Matches n or more occurrences.
{n,m} Matches between n and m occurrences.
\ Escapes special characters.

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

2. CHARACTER CLASSES
Syntax Description
[arn] where one of the a, r or n is present
[a-n] returns a match for any lowercase character between a and n
[^arn] returns a match where character is not a, r or n
[0123] return a match where 0,1,2 or 3 is present
[0-9] returns a match where a number between 0 to 9
[0-5][0-9] returns a match for any number between 00-59
[a-zA-Z] returns a match for any alphabetical character
[+] in sets, special characters have no meaning, so it will return a match if a '+' character is found.

3. PREDEFINED SEQUENCES
Sequence Description
\A returns a match if the specified characters are at the start of the string
\b Returns a match where the specified characters are at the beginning or at the end of a word
\B A match where the specified characters are present, but NOT at the beginning or at the end of a word
\d returns a match where the string contains digits 0-9
\D returns a match where the string does not contains digits 0-9
\s returns a match where the string contains a white space character
\S returns a match where the string DOES NOT contains a white space character
\w returns a match where the string contains word character i.e., a-zA-Z0-9 and underscore
\W returns a match where the string DOES NOT contain a word character
\Z returns a match if the specified characters are at the end of the string.

4. GROUPING AND CAPTURING

Parentheses () are used to group parts of a regex pattern and capture matches. Capturing groups save the matched
content for later use, while non-capturing groups allow grouping without saving the matched content.

CAPTURING GROUP
A capturing group matches the specified pattern and saves the matched content for reference. For example:

pattern = r"(\d{3})-(\d{2})-(\d{4})"
text = "123-45-6789"
match = re.match(pattern, text)
print(match.groups()) # Output: ('123', '45', '6789')

NON-CAPTURING GROUP
A non-capturing group groups the pattern without saving the matched content. Use (?:...) to create a non-
capturing group. For example:

pattern = r"(?:\d{3})-(\d{2})-(\d{4})"
text = "123-45-6789"
match = re.match(pattern, text)
print(match.groups()) # Output: ('45', '6789')

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

PRACTICAL EXAMPLES
1. MATCHING EMAIL ADDRESSES
Example: john.doe123@abc-school.ac.in
● The username part i.e., before @ part:
Can contain alphabets a-z, A-Z, numbers 0-9, dot ., space, hyphen -, and some emails unlike gmail allow
underscore _ and other special characters like + as well.
○ john.doe123@abc-school.ac.in : “[a-zA-Z0-9 .-_+]+” : one or more than one occurrence of these
characters
● The domain part i.e., after @ part:
Can contain sub domains, domains, domain extensions and one necessary ending extension that must
contain at least 2 alphabets.
○ john.doe123@abc-school.ac.in : “[a-zA-Z0-9-.]+”
○ john.doe123@abc-school.ac.in : “\.[a-zA-Z]{2,}”
# Complete regex:
r"[a-zA-Z0-9 ._-+]+@[a-zA-Z-.]+\.[a-zA-Z]{2,}"
# Equivalent regex:
r"[\w .-+]+@[\w-.]+\.[a-zA-Z]{2,}"
# (\w: any alphabet, number, underscore, {2,} means occurrence greater than 2
times)

2. MATCHING QUESTIONS
Examples:
- Is this your final answer?
- "Python is a snake" - is this statement correct?
- Why is the sky blue during the day?
● Starting of question: can be alphanumeric, can contain quotation marks: r”[a-zA-Z0-9\”’]+”
● Middle part of a question: r”[a-zA-Z0-9\”’ ,-_–+]*”
(you can include more special characters if they’re allowed in the questions, or you can use [^?\n] to match
every character except a question mark and a new line)
● Ending of a question: r”\?”

# Complete regex:
r"[\w\"']+[\w\"',-_+ ]*\?"

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

3. MATCHING URLS
Examples:
- https://www.example.com?query_param1=value1&query_param2=value2
- Components of a URL:

Since, there are a lot of special characters allowed in the URL, some are not allowed, for example white space is
encoded using %20, and non ascii characters are also encoded using word characters and some special characters.

● Scheme (http/https) of url followed by :// - r”https?:\/\/”

● Subdomain, domain, top level domain: r”(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}”
● Port number’s non capturing group: r”(?::[0-9]{1,5})?”
● Path’s non capturing group: r”(?:\/[^\s?#]*)?”
● Query Separator and Parameters’ non capturing group: r”(?:\?[a-zA-Z0-9%._\-~+=&]*)?”
● Fragment’s non capturing group: r”(?:#[^\s]+)?”

# Complete regex:
r"https?:\/\/(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}(?::[0-
9]{1,5})?(?:\/[^\s?#]*)?(?:\?[a-zA-Z0-9%._\-~+=&]*)?(?:#[^\s]*)?"

4. MATCHING IPV4 ADDRESSES

An IPv4 address consists of four octets, separated by dots (.), where each octet is a number between 0 and 255.
Logic behind regex to match a number between 0-255:
● Number between 0-9: [0-9]
● Number between 10-99: [1-9][0-9]
● Number between 0-99: [0-9][0-9]?
● Number between 0-199: [0-1]?[0-9][0-9]?
● Number between 200-255: 2[0-5][0-5]

Regex for number to be in between 0-255: r”(?:[0-1]?[0-9][0-9]?|2[0-5][0-5])”

# Complete regex:
r"(?:(?:[0-1]?[0-9][0-9]?|2[0-5][0-5])\.){3}(?:[0-1]?[0-9][0-9]?|2[0-5][0-5])"

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

Python’s re Module
The re module provides built-in functions for regex operations.

COMMON FUNCTIONS
Function Description Syntax Return Value (x)

Returns a list containing all matches in x=

List of all matched
re.findall the order they are found. If no match, re.findall("regex_expression",
strings
empty list. text)

Returns a match object for the first x=

Match object (if
re.search match found. Returns None if no match is re.search("regex_expression",
found) or None
found. text)

Splits a string into a list at each match. x = re.split("regex_expression", List of separated

re.split
Optionally, limit the splits with maxsplit. text, [maxsplit]) strings

Replaces one or more matches with a x = re.sub("regex_expression", A new string with

re.sub given string. Optionally limit "replacement_string", text, substitutions
replacements with count. count) applied

CODE:
import re

# Sample text with correct and incorrect examples

sample_text = """
Correct Examples:
john.doe123@abc-school.ac.in
why_not.valid+email@gmail.com
Is this your final answer?
"Python is a snake" - is this statement correct?
https://www.example.com?query_param1=value1&query_param2=value2
http://example.org/resource
192.168.1.1
127.0.0.1

Incorrect Examples:
john.doe@com
noatsymbol.com
Is this even correct..
ftp://wrong.protocol.com
256.256.256.256
999.999.999.999
"""

# Regex patterns
patterns = {
"Email Address": r"[a-zA-Z0-9._+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
"Question": r"[a-zA-Z0-9\"'][a-zA-Z0-9\"',-_-+ ]*\?",
"URL": r"https?:\/\/(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}(?::[0-
9]{1,5})?(?:\/[^\s?#]*)?(?:\?[a-zA-Z0-9%._\-~+=&]*)?(?:#[^\s]*)?",
"IPv4 Address": r"(?:(?:[0-1]?[0-9][0-9]?|2[0-5][0-5])\.){3}(?:[0-1]?[0-9][0-
9]?|2[0-5][0-5])"
}

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

def test_regex(pattern_name, pattern, text):
print(f"\nTesting: {pattern_name}")
matches = re.findall(pattern, text)
print("Matches:")
for match in matches:
print(f" - {match}")

# Testing all patterns

for name, regex in patterns.items():
test_regex(name, regex, sample_text)

OUTPUT:
Testing: Email Address
Matches:
- john.doe123@abc-school.ac.in
- why_not.valid+email@gmail.com
Testing: Question
Matches:
- Is this your final answer?
- "Python is a snake" - is this statement correct?
- https://www.example.com?
Testing: URL
Matches:
- https://www.example.com?query_param1=value1&query_param2=value2
- http://example.org/resource
Testing: IPv4 Address
Matches:
- 192.168.1.1
- 127.0.0.1

Theory References:
https://www.w3schools.com/python/python_regex.asp
https://www.geeksforgeeks.org/components-of-a-url/

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

Chicago Template
No ratings yet
Chicago Template
4 pages
China Day 5 Ming Lo
No ratings yet
China Day 5 Ming Lo
3 pages
Demo Teaching Detailed Lesson Plan
67% (3)
Demo Teaching Detailed Lesson Plan
8 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
Chapter 10
No ratings yet
Chapter 10
28 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
12 pages
9 RegEx
No ratings yet
9 RegEx
57 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
Python Re
No ratings yet
Python Re
18 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
14 pages
9 RegEx
No ratings yet
9 RegEx
57 pages
RegEx in Python
No ratings yet
RegEx in Python
5 pages
9python Simple Character Matches
No ratings yet
9python Simple Character Matches
19 pages
Python Regex: Re - Match, Re - Search, Re - Findall With Example
No ratings yet
Python Regex: Re - Match, Re - Search, Re - Findall With Example
10 pages
06 - Regular Expressions and Network Programming
No ratings yet
06 - Regular Expressions and Network Programming
55 pages
Python Unit 5
No ratings yet
Python Unit 5
143 pages
Python Regular Expression
100% (1)
Python Regular Expression
31 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
Python Reg Expressions
No ratings yet
Python Reg Expressions
8 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
Unit - 4 Regex
No ratings yet
Unit - 4 Regex
28 pages
Python Regex
No ratings yet
Python Regex
8 pages
Untitled
No ratings yet
Untitled
53 pages
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Howto Regex
No ratings yet
Howto Regex
19 pages
14.regular Expression
No ratings yet
14.regular Expression
3 pages
A Simple Intro To Regex With Python: You Have 2 Free Stories Left This Month
No ratings yet
A Simple Intro To Regex With Python: You Have 2 Free Stories Left This Month
18 pages
Unit7 RegularExpressionpdf 2023 10 17 09 16 29
No ratings yet
Unit7 RegularExpressionpdf 2023 10 17 09 16 29
17 pages
Regular Expressions: Luísa Coheur
No ratings yet
Regular Expressions: Luísa Coheur
22 pages
Regular Expression Python
No ratings yet
Regular Expression Python
23 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
5A - Regex
No ratings yet
5A - Regex
32 pages
Text Processing For NLP Understanding Regex
No ratings yet
Text Processing For NLP Understanding Regex
16 pages
Python RegEx
No ratings yet
Python RegEx
11 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
Regular Expression L
No ratings yet
Regular Expression L
20 pages
Data Analysis Using Python Lab Ex3
No ratings yet
Data Analysis Using Python Lab Ex3
27 pages
Lecture 9 Python
No ratings yet
Lecture 9 Python
8 pages
Howto Regex
No ratings yet
Howto Regex
17 pages
2 - Python Strings
No ratings yet
2 - Python Strings
23 pages
Regular
No ratings yet
Regular
9 pages
Structuring With Regix
No ratings yet
Structuring With Regix
49 pages
03.1 - Regular Expressions
No ratings yet
03.1 - Regular Expressions
34 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Python Regex Cheatsheet With Examples: Re Module Functions
No ratings yet
Python Regex Cheatsheet With Examples: Re Module Functions
1 page
Regex Patterns and Syntax
No ratings yet
Regex Patterns and Syntax
6 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Module II
No ratings yet
Module II
17 pages
Manipulating Text With Regular Expression in Python
No ratings yet
Manipulating Text With Regular Expression in Python
4 pages
Module5 RegularExpressions
No ratings yet
Module5 RegularExpressions
10 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
Lecture03 Regular Expressions 20092024 012539pm
No ratings yet
Lecture03 Regular Expressions 20092024 012539pm
36 pages
Regular Expression
No ratings yet
Regular Expression
39 pages
Regular Expressions: Regular Expression Syntax in Python
No ratings yet
Regular Expressions: Regular Expression Syntax in Python
11 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Python Course: Session 6b - Regular Expressions
No ratings yet
Python Course: Session 6b - Regular Expressions
11 pages
Regular Expressions: Regular Expressions Are A Powerful Tool For Various Kinds of String Manipulation
No ratings yet
Regular Expressions: Regular Expressions Are A Powerful Tool For Various Kinds of String Manipulation
4 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Regular Expression
No ratings yet
Regular Expression
21 pages
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
PHP programming
From Everand
PHP programming
Nino Paiotta
No ratings yet
Week 01 (The Foundations Logic and Proofs)
No ratings yet
Week 01 (The Foundations Logic and Proofs)
28 pages
Local Media5898135186409141041
No ratings yet
Local Media5898135186409141041
11 pages
Reject Leftovers D K Olukoya Olukoya, D K Z Library
No ratings yet
Reject Leftovers D K Olukoya Olukoya, D K Z Library
51 pages
Week 3 - Seminars PDF
No ratings yet
Week 3 - Seminars PDF
8 pages
DSS Generators, SIMOn's Model
No ratings yet
DSS Generators, SIMOn's Model
10 pages
Fast Track Grammar Review For EFLTeachers PDF
100% (1)
Fast Track Grammar Review For EFLTeachers PDF
66 pages
DateSheet - Annual Examination 2024 - 25
No ratings yet
DateSheet - Annual Examination 2024 - 25
1 page
Course Outline Introduction To Computers HPSS 1102
No ratings yet
Course Outline Introduction To Computers HPSS 1102
3 pages
Four Happy Penguins
No ratings yet
Four Happy Penguins
3 pages
Form 5 Catch Us If You Can
No ratings yet
Form 5 Catch Us If You Can
103 pages
Real'S Howto PDF Version
No ratings yet
Real'S Howto PDF Version
210 pages
Vishal
No ratings yet
Vishal
6 pages
Lesson Plan INSET DEMO
No ratings yet
Lesson Plan INSET DEMO
2 pages
Assignment 2 - Linux Filesystem Integrity Check With HASH and AIDE - Win23
No ratings yet
Assignment 2 - Linux Filesystem Integrity Check With HASH and AIDE - Win23
11 pages
Contoh Soalan PKB 3105 English Language Teaching Methodology
No ratings yet
Contoh Soalan PKB 3105 English Language Teaching Methodology
10 pages
Match The Questions With The Answers. (15p) : Written Exam 1 - Units 1-2
No ratings yet
Match The Questions With The Answers. (15p) : Written Exam 1 - Units 1-2
3 pages
Xylophone Grassland Prompt Generation Attempter Specifications
No ratings yet
Xylophone Grassland Prompt Generation Attempter Specifications
24 pages
Oracle Cloud Fusion Technical Course
No ratings yet
Oracle Cloud Fusion Technical Course
7 pages
admin,+02-The+Turkish+Compenent+of+the Syed+Tanvir+Wasti
No ratings yet
admin,+02-The+Turkish+Compenent+of+the Syed+Tanvir+Wasti
11 pages
Environmental Narrative
No ratings yet
Environmental Narrative
31 pages
Modules & Test Case With Tricentis Tosca
No ratings yet
Modules & Test Case With Tricentis Tosca
38 pages
Ode To Charlotte Corday
100% (1)
Ode To Charlotte Corday
3 pages
CFE 103 Catholic Foundations of Mission 1
No ratings yet
CFE 103 Catholic Foundations of Mission 1
3 pages
Auto Model Based Design With Simulation White Paper
No ratings yet
Auto Model Based Design With Simulation White Paper
14 pages
The Syllogistic Method in A Nutshell
No ratings yet
The Syllogistic Method in A Nutshell
1 page
CV - Pushpa Shivanna
No ratings yet
CV - Pushpa Shivanna
2 pages
Amavasya Tharpana Sankalpa Mantra 2013-2014
No ratings yet
Amavasya Tharpana Sankalpa Mantra 2013-2014
4 pages

RegEx in Python

Uploaded by

RegEx in Python

Uploaded by

REGULAR EXPRESSIONS (REGEX) IN PYTHON:

What is a Regular Expression?

● Extracting email addresses

Regex Syntax in Python

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

4. GROUPING AND CAPTURING

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

● Scheme (http/https) of url followed by :// - r”https?:\/\/”

4. MATCHING IPV4 ADDRESSES

Regex for number to be in between 0-255: r”(?:[0-1]?[0-9][0-9]?|2[0-5][0-5])”

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

Returns a list containing all matches in x=

Returns a match object for the first x=

Splits a string into a list at each match. x = re.split("regex_expression", List of separated

Replaces one or more matches with a x = re.sub("regex_expression", A new string with

# Sample text with correct and incorrect examples

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

# Testing all patterns

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

You might also like