Extract Data from a String with Python Regular Expressions



In this article you will find out how to extract data from a string with Python Regular Expressions. In Python, extracting data from the given string is a common task. Regular expressions (regex) offer pattern-matching functionality to get and identify specific parts of a string.

Python's re module helps in working with regex easily. The common functions or this module are re.search(), re.findall() and re.match() to make it easier to extract desired data.

Extract Data Using Regular Expressions

For extracting data we will have to define a pattern and apply it to a string with the help of regex functions. Now we will see different examples to extract data with the help of Python's regular expressions -

Example: Extracting Digits from a String

The following example will extract digits from the given string and we will use "\d+" regex pattern for this purpose. And we will return these number as a list.

Numbers are sometimes embedded in text data and we may need to extract them for further use. So this will be very useful for you to complete this task.

# Import re module
import re

# define your text here
txt = "My ID: 89456, Ref num: 7863"

# extract all the numbers from the string using findall()
nums = re.findall(r"\d+", txt)

# print the result
print(nums)

Output

This will create the below outcome ?

['89456', '7863']

Example: Extracting Email Addresses

Here the regex pattern we are using is "\b[\w.-]+@[\w.-]+\.\w+\b" for finding or match email addresses in text. This pattern will match email addresses like username@domain.com.

It will check for letters or numbers before and after @, a dot and a domain ending (like .com). Also to match the pattern we will use the re.findall() method of Python's re module.

# Import re module
import re

# define your text here which contains email ids
txt = "Contact us at contact@tutorialspoint.com or info@tutorix.com"

# Extract email IDs
emails = re.findall(r"\b[\w.-]+@[\w.-]+\.\w+\b", txt)  

# print the result
print(emails)

Output

This will generate the below result -

['contact@tutorialspoint.com', 'info@tutorix.com']

Example: Extracting Hashtags

As you know hashtags are widely used on platforms like Twitter and Instagram. This example will be useful to find and extract the hash tags from a social media posts.

The pattern we are using for this task is "#\w+" which will look for words prefixed with # and get these tags in a list to display.

# Import re module
import re

# define your text here which contains hash tags
txt = "Latest trending topics are: #Python #Coding #AI"

# Extract hashtags using the findall() method
tags = re.findall(r"#\w+", txt)  

# print the result
print(tags)

Output

This will produce the following result ?

['#Python', '#Coding', '#AI']

Example: Extracting Dates

In this program we are using "\d{4}-\d{2}-\d{2}" pattern to match dates. This pattern matches a date like 2025-05-29.

It will look for four digits, a dash, two digits, a dash and two digits. So using this pattern we will get our dates easily from the given text.

# Import re module
import re

# define your text here which contains some dates
txt = "Events: 2023-08-15, 2025-05-29, 2024-12-01"

# Extract dates using the findall() method
dates = re.findall(r"\d{4}-\d{2}-\d{2}", txt)  

# print the result
print(dates)

Output

This will lead to the following outcome -

['2023-08-15', '2025-05-29', '2024-12-01']
Updated on: 2025-06-06T15:29:37+05:30

936 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements