
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Extract Data from a String with Python Regular Expressions
In this article you will find out how to extract data from a string with Python Regular Expressions. In Python, extracting data from the given string is a common task. Regular expressions (regex) offer pattern-matching functionality to get and identify specific parts of a string.
Python's re module helps in working with regex easily. The common functions or this module are re.search(), re.findall() and re.match() to make it easier to extract desired data.
Extract Data Using Regular Expressions
For extracting data we will have to define a pattern and apply it to a string with the help of regex functions. Now we will see different examples to extract data with the help of Python's regular expressions -
Example: Extracting Digits from a String
The following example will extract digits from the given string and we will use "\d+" regex pattern for this purpose. And we will return these number as a list.
Numbers are sometimes embedded in text data and we may need to extract them for further use. So this will be very useful for you to complete this task.
# Import re module import re # define your text here txt = "My ID: 89456, Ref num: 7863" # extract all the numbers from the string using findall() nums = re.findall(r"\d+", txt) # print the result print(nums)
Output
This will create the below outcome ?
['89456', '7863']
Example: Extracting Email Addresses
Here the regex pattern we are using is "\b[\w.-]+@[\w.-]+\.\w+\b" for finding or match email addresses in text. This pattern will match email addresses like username@domain.com.
It will check for letters or numbers before and after @, a dot and a domain ending (like .com). Also to match the pattern we will use the re.findall() method of Python's re module.
# Import re module import re # define your text here which contains email ids txt = "Contact us at contact@tutorialspoint.com or info@tutorix.com" # Extract email IDs emails = re.findall(r"\b[\w.-]+@[\w.-]+\.\w+\b", txt) # print the result print(emails)
Output
This will generate the below result -
['contact@tutorialspoint.com', 'info@tutorix.com']
Example: Extracting Hashtags
As you know hashtags are widely used on platforms like Twitter and Instagram. This example will be useful to find and extract the hash tags from a social media posts.
The pattern we are using for this task is "#\w+" which will look for words prefixed with # and get these tags in a list to display.
# Import re module import re # define your text here which contains hash tags txt = "Latest trending topics are: #Python #Coding #AI" # Extract hashtags using the findall() method tags = re.findall(r"#\w+", txt) # print the result print(tags)
Output
This will produce the following result ?
['#Python', '#Coding', '#AI']
Example: Extracting Dates
In this program we are using "\d{4}-\d{2}-\d{2}" pattern to match dates. This pattern matches a date like 2025-05-29.
It will look for four digits, a dash, two digits, a dash and two digits. So using this pattern we will get our dates easily from the given text.
# Import re module import re # define your text here which contains some dates txt = "Events: 2023-08-15, 2025-05-29, 2024-12-01" # Extract dates using the findall() method dates = re.findall(r"\d{4}-\d{2}-\d{2}", txt) # print the result print(dates)
Output
This will lead to the following outcome -
['2023-08-15', '2025-05-29', '2024-12-01']