0% found this document useful (0 votes)
9 views52 pages

Regular Expression

This document provides an introduction to regular expressions in Python. It explains what regular expressions are, how they can be used to find patterns in text, and some common metacharacters and quantifiers used in regular expressions like repetition, grouping and alternatives.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views52 pages

Regular Expression

This document provides an introduction to regular expressions in Python. It explains what regular expressions are, how they can be used to find patterns in text, and some common metacharacters and quantifiers used in regular expressions like repetition, grouping and alternatives.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Introduction to

regular expressions
REGULAR EX P RES S ION S IN P YTH ON

Maria Eugenia Inzaugarat


Data Scientist
What is a regular expression?
REGular EXpression or regex:
String containing a combination of normal characters and special metacharacters that describes patterns to
nd text or positions within a text

REGULAR EXPRESSIONS IN PYTHON


What is a regular expression?
REGular EXpression or regex:
String containing a combination of normal characters and special metacharacters that describes patterns to
nd text or positions within a text

Normal characters match themselves ( st )

REGULAR EXPRESSIONS IN PYTHON


What is a regular expression?
REGular EXpression or regex:
String containing a combination of normal characters and special metacharacters that describes patterns to
nd text or positions within a text

Metacharacters represent types of characters ( \d , \s , \w ) or ideas ( {3,10} )

REGULAR EXPRESSIONS IN PYTHON


What is a regular expression?
REGular EXpression or regex:
String containing a combination of normal characters and special metacharacters that describes patterns to
nd text or positions within a text

Metacharacters represent types of characters ( \d , \s , \w ) or ideas ( {3,10} )

REGULAR EXPRESSIONS IN PYTHON


What is a regular expression?
REGular EXpression or regex:
String containing a combination of normal characters and special metacharacters that describes patterns to
nd text or positions within a text

Metacharacters represent types of characters ( \d , \s , \w ) or ideas ( {3,10} )

REGULAR EXPRESSIONS IN PYTHON


What is a regular expression?
REGular EXpression or regex:
String containing a combination of normal characters and special metacharacters that describes patterns to
nd text or positions within a text

Metacharacters represent types of characters ( \d , \s , \w ) or ideas ( {3,10} )

REGULAR EXPRESSIONS IN PYTHON


What is a regular expression?
REGular EXpression or regex:
String containing a combination of normal characters and special metacharacters that describes patterns to
nd text or positions within a text

Pattern: a sequence of characters that maps to words or punctuation

REGULAR EXPRESSIONS IN PYTHON


What is a regular expression?
REGular EXpression or regex:
String containing a combination of normal characters and special metacharacters that describes patterns to
nd text or positions within a text

Pattern matching usage:


Find and replace text

Validate strings

Very powerful and fast

REGULAR EXPRESSIONS IN PYTHON


The re module
import re

Find all matches of a pattern:

re.findall(r"#movies", "Love #movies! I had fun yesterday going to the #movies")

['#movies', '#movies']

REGULAR EXPRESSIONS IN PYTHON


The re module
import re

Split string at each match:

re.split(r"!", "Nice Place to eat! I'll come back! Excellent meat!")

['Nice Place to eat', " I'll come back", ' Excellent meat', '']

REGULAR EXPRESSIONS IN PYTHON


The re module
import re

Replace one or many matches with a string:

re.sub(r"yellow", "nice", "I have a yellow car and a yellow house in a yellow neighborhood")

'I have a nice car and a nice house in a nice neighborhood'

REGULAR EXPRESSIONS IN PYTHON


Supported metacharacters
re.findall(r"User\d", "The winners are: User9, UserN, User8")

['User9', 'User8']

re.findall(r"User\D", "The winners are: User9, UserN, User8")

['UserN']

REGULAR EXPRESSIONS IN PYTHON


Supported metacharacters
re.findall(r"User\w", "The winners are: User9, UserN, User8")

['User9', 'UserN', 'User8']

re.findall(r"\W\d", "This skirt is on sale, only $5 today!")

['$5']

REGULAR EXPRESSIONS IN PYTHON


Supported metacharacters
re.findall(r"Data\sScience", "I enjoy learning Data Science")

['Data Science']

re.sub(r"ice\Scream", "ice cream", "I really like ice-cream")

'I really like ice cream'

REGULAR EXPRESSIONS IN PYTHON


Let's practice!
REGULAR EX P RES S ION S IN P YTH ON
Repetitions
REGULAR EX P RES S ION S IN P YTH ON

Maria Eugenia Inzaugarat


Data Science
Repeated characters
Validate the following string:

REGULAR EXPRESSIONS IN PYTHON


Repeated characters
Validate the following string:

REGULAR EXPRESSIONS IN PYTHON


Repeated characters
Validate the following string:

REGULAR EXPRESSIONS IN PYTHON


Repeated characters
Validate the following string: import re
password = "password1234"

re.search(r"\w\w\w\w\w\w\w\w\d\d\d\d", password)

<_sre.SRE_Match object; span=(0, 12), match='password1234'>

REGULAR EXPRESSIONS IN PYTHON


Repeated characters
Validate the following string: import re
password = "password1234"

re.search(r"\w{8}\d{4}", password)

<_sre.SRE_Match object; span=(0, 12), match='password1234'>

Quanti ers:
A metacharacter that tells the regex
engine how many times to match a
character immediately to its left.

REGULAR EXPRESSIONS IN PYTHON


Quanti ers
Once or more: +

text = "Date of start: 4-3. Date of registration: 10-04."

re.findall(r" ", text)

REGULAR EXPRESSIONS IN PYTHON


Quanti ers
Once or more: +

text = "Date of start: 4-3. Date of registration: 10-04."

re.findall(r"\d+- ", text)

REGULAR EXPRESSIONS IN PYTHON


Quanti ers
Once or more: +

text = "Date of start: 4-3. Date of registration: 10-04."

re.findall(r"\d+-\d+", text)

['4-3', '10-04']

REGULAR EXPRESSIONS IN PYTHON


Quanti ers
Zero times or more: *

my_string = "The concert was amazing! @ameli!a @joh&&n @mary90"

re.findall(r"@\w+\W*\w+", my_string)

['@ameli!a', '@joh&&n', '@mary90']

REGULAR EXPRESSIONS IN PYTHON


Quanti ers
Zero times or once: ?

text = "The color of this image is amazing. However, the colour blue could be brighter."

re.findall(r"colou?r", text)

['color', 'colour']

REGULAR EXPRESSIONS IN PYTHON


Quanti ers
n times at least, m times at most : {n, m}

phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424"

re.findall(r" ", phone_number)

REGULAR EXPRESSIONS IN PYTHON


Quanti ers
n times at least, m times at most : {n, m}

phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424"

re.findall(r"\d{1,2}- ", phone_number)

REGULAR EXPRESSIONS IN PYTHON


Quanti ers
n times at least, m times at most : {n, m}

phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424"

re.findall(r"\d{1,2}-\d{3}- ", phone_number)

REGULAR EXPRESSIONS IN PYTHON


Quanti ers
n times at least, m times at most : {n, m}

phone_number = "John: 1-966-847-3131 Michelle: 54-908-42-42424"

re.findall(r"\d{1,2}-\d{3}-\d{2,3}-\d{4,}", phone_number)

['1-966-847-3131', '54-908-42-42424']

REGULAR EXPRESSIONS IN PYTHON


Quanti ers
Immediately to the left
r"apple+" : + applies to e and not to apple

REGULAR EXPRESSIONS IN PYTHON


Let's practice!
REGULAR EX P RES S ION S IN P YTH ON
Regex
metacharacters
REGULAR EX P RES S ION S IN P YTH ON

Maria Eugenia Inzaugarat


Data Scientist
Looking for patterns
Two different operations to nd a match:

re.search(r"\d{4}", "4506 people attend the show re.match(r"\d{4}", "4506 people attend the show"

<_sre.SRE_Match object; span=(0, 4), match='4506 <_sre.SRE_Match object; span=(0, 4), match='4506

re.search(r"\d+", "Yesterday, I saw 3 shows") print(re.match(r"\d+","Yesterday, I saw 3 shows"

<_sre.SRE_Match object; span=(17, 18), match='3'> None

REGULAR EXPRESSIONS IN PYTHON


Special characters
Match any character (except newline): .

my_links = "Just check out this link: www.amazingpics.com. It has amazing photos!"

re.findall(r"www com", my_links)

REGULAR EXPRESSIONS IN PYTHON


Special characters
Match any character (except newline): .

my_links = "Just check out this link: www.amazingpics.com. It has amazing photos!"

re.findall(r"www.+com", my_links)

['www.amazingpics.com']

REGULAR EXPRESSIONS IN PYTHON


Special characters
Start of the string: ^

my_string = "the 80s music was much better that the 90s"

re.findall(r"the\s\d+s", my_string)

['the 80s', 'the 90s']

re.findall(r"^the\s\d+s", my_string)

['the 80s']

REGULAR EXPRESSIONS IN PYTHON


Special characters
End of the string: $

my_string = "the 80s music hits were much better that the 90s"

re.findall(r"the\s\d+s$", my_string)

['the 90s']

REGULAR EXPRESSIONS IN PYTHON


Special characters
Escape special characters: \

my_string = "I love the music of Mr.Go. However, the sound was too loud."

print(re.split(r".\s", my_string))

['', 'lov', 'th', 'musi', 'o', 'Mr.Go', 'However', 'th', 'soun', 'wa', 'to', 'loud.']

print(re.split(r"\.\s", my_string))

['I love the music of Mr.Go', 'However, the sound was too loud.']

REGULAR EXPRESSIONS IN PYTHON


OR operator
Character: |

my_string = "Elephants are the world's largest land animal! I would love to see an elephant one day"

re.findall(r"Elephant|elephant", my_string)

['Elephant', 'elephant']

REGULAR EXPRESSIONS IN PYTHON


OR operator
Set of characters: [ ]

my_string = "Yesterday I spent my afternoon with my friends: MaryJohn2 Clary3"

re.findall(r"[a-zA-Z]+\d", my_string)

['MaryJohn2', 'Clary3']

REGULAR EXPRESSIONS IN PYTHON


OR operator
Set of characters: [ ]

my_string = "My&name&is#John Smith. I%live$in#London."

re.sub(r"[#$%&]", " ", my_string)

'My name is John Smith. I live in London.'

REGULAR EXPRESSIONS IN PYTHON


OR operand
Set of characters: [ ]
^ transforms the expression to negative

my_links = "Bad website: www.99.com. Favorite site: www.hola.com"


re.findall(r"www[^0-9]+com", my_links)

['www.hola.com']

REGULAR EXPRESSIONS IN PYTHON


Let's practice!
REGULAR EX P RES S ION S IN P YTH ON
Greedy vs. non-
greedy matching
REGULAR EX P RES S ION S IN P YTH ON

Maria Eugenia Inzaugarat


Data Scientist
Greedy vs. non-greedy matching
Two types of matching methods:
Greedy

Non-greedy or lazy

Standard quanti ers are greedy by default: * , + , ? , {num, num}

REGULAR EXPRESSIONS IN PYTHON


Greedy matching
Greedy: match as many characters as possible

Return the longest match

import re
re.match(r"\d+", "12345bcada")

<_sre.SRE_Match object; span=(0, 5), match='12345'>

REGULAR EXPRESSIONS IN PYTHON


Greedy matching
Backtracks when too many character matched

Gives up characters one at a time

import re
re.match(r".*hello", "xhelloxxxxxx")

<_sre.SRE_Match object; span=(0, 6), match='xhello'>

REGULAR EXPRESSIONS IN PYTHON


Non-greedy matching
Lazy: match as few characters as needed

Returns the shortest match

Append ? to greedy quanti ers

import re
re.match(r"\d+?", "12345bcada")

<_sre.SRE_Match object; span=(0, 1), match='1'>

REGULAR EXPRESSIONS IN PYTHON


Non-greedy matching
Backtracks when too few characters matched

Expands characters one a time

import re
re.match(r".*?hello", "xhelloxxxxxx")

<_sre.SRE_Match object; span=(0, 6), match='xhello'>

REGULAR EXPRESSIONS IN PYTHON


Let's practice!
REGULAR EX P RES S ION S IN P YTH ON

You might also like