0% found this document useful (0 votes)

61 views1 page

Working With Text Data in Python

The document provides examples of using string methods in pandas to manipulate and extract information from text data. It shows how to format, detect patterns, extract matches, split, modify case, pad and join strings using methods like .str.contains(), .str.split(), .str.lower(), .str.pad(), and others. The examples use pandas Series containing string data about suits of cards and rock/paper/scissors to demonstrate the various string manipulation techniques.

Uploaded by

Clóvis Nóbrega

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views1 page

Working With Text Data in Python

Uploaded by

Clóvis Nóbrega

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Working with text  > Formatting settings > Detecting Matches

data in Python # Generate an example DataFramed named df

df = pd.DataFrame({"x": [0.123, 4.567, 8.901]})

# Detect if a regex pattern is present in strings with .str.contains()

suits.str.contains("[ae]") # False True True True

# x
# Count the number of matches with .str.count()

# 0 0.123
suits.str.count("[ae]") # 0 1 2 2

# 1 4.567

Learn Python online at www.DataCamp.com # 2 8.901 # Locate the position of substrings with str.find()

suits.str.find("e") # -1 -1 1 4

# Visualize and format table output

df.style.format(precision = 1)

- x The output of style.format

> Extracting matches
Example data used throughout 0 0.1 is an HTML table
>
this cheat sheet 1 4.5 # Extract matches from strings with str.findall()

suits.str.findall(".[ae]") # [] ["ia"] ["he"[ ["pa", "de"]

2 8.9

Throughout this cheat sheet, we’ll be using two pandas series named suits and # Extract capture groups with .str.extractall()

rock_paper_scissors. suits.str.extractall("([ae])(.)")

# 0 1

import pandas as pd

Splitting strings
# match

suits = pd.Series(["clubs", "Diamonds", "hearts", "Spades"])

> # 1 0
# 2 0
a m

e a

rock_paper_scissors = pd.Series(["rock ", " paper", "scissors"]) # 3 0 a d

# Split strings into list of characters with .str.split(pat="")

# 1 e s

suits.str.split(pat="") 

# Get subset of strings that match with x[x.str.contains()]

String lengths and substrings

# [, "c" "l" "u" "b" "s", ]
suits[suits.str.contains("d")] # "Diamonds" "Spades"

> # [, "D" "i" "a" "m" "o" "n" "d" "s", ]

# [, "h" "e" "a" "r" "t" "s", ]

# [, "S" "p" "a" "d" "e" "s", ]

# Get the number of characters with .str.len()

suits.str.len() # Returns 5 8 6 6

# Split strings by a separator with .str.split()

suits.str.split(pat = "a") 

> Replacing matches

# Get substrings by position with .str[]

# Replace a regex match with another string with .str.replace()

suits.str[2:5] # Returns "ubs" "amo" "art" "ade"

# ["clubs"]

suits.str.replace("a", "4") # "clubs" "Di4monds" "he4rts" "Sp4des"

# ["Di", "monds"]

# Get substrings by negative position with .str[]

# ["he", "rts"]

# Remove a suffix with .str.removesuffix()

suits.str[:-3] # "cl" "Diamo" "hea" "Spa

# ["Sp", "des"]

suits.str.removesuffix # "club" "Diamond" "heart" "Spade"

# Remove whitespace from the start/end with .str.strip()

# Split strings and return DataFrame with .str.split(expand=True)
# Replace a substring with .str.slice_replace()

rock_paper_scissors.str.strip() # "rock" "paper" "scissors"

suits.str.split(pat = "a", expand=True) 

rhymes = pd.Series(["vein", "gain", "deign"])

rhymes.str.slice_replace(0, 1, "r") # "rein" "rain" "reign"

# Pad strings to a given length with .str.pad()

# 0 1

suits.str.pad(8, fillchar="_") # "_clubs" "Diamonds" "hearts" "__Spades" # 0 clubs None

# 1 Di monds

# 2 he rts

# 3 Sp des

> Changing case

# Convert to lowercase with .str.lower()
> Joining or concatenating strings Learn Python Online at
suits.str.lower() # "clubs" "diamonds" "hearts" "spades"

www.DataCamp.com
# Convert to uppercase with .str.upper()
# Combine two strings with +

suits.str.upper() # "CLUBS" "DIAMONDS" "HEARTS" "SPADES"

suits + "5" # "clubs5" "Diamonds5" "hearts5" "Spades5"

# Convert to title case with .str.title()

# Collapse character vector to string with .str.cat()

pd.Series("hello, world!").str.title() # "Hello, World!"

suits.str.cat(sep=", ") # "clubs, Diamonds, hearts, Spades"

# Convert to sentence case with .str.capitalize()

# Duplicate and concatenate strings with *

pd.Series("hello, world!").str.capitalize() # "Hello, world!" suits * 2 # "clubsclubs" "DiamondsDiamonds" "heartshearts" "SpadesSpades"

09931145H Syngistix For ICP Software Guide
100% (2)
09931145H Syngistix For ICP Software Guide
345 pages
Module 1 - Week1 and 2 - Platform Technologies PDF
100% (3)
Module 1 - Week1 and 2 - Platform Technologies PDF
54 pages
515 HPU Facility Integration
100% (2)
515 HPU Facility Integration
35 pages
HPE Solutions For Qumulo: No More Data Blindness
No ratings yet
HPE Solutions For Qumulo: No More Data Blindness
65 pages
Chuletas DataCamp-3
No ratings yet
Chuletas DataCamp-3
1 page
Unit 3 2
No ratings yet
Unit 3 2
3 pages
Python Programming Unit-II
No ratings yet
Python Programming Unit-II
23 pages
String Function
No ratings yet
String Function
6 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
43 pages
Python String Methods - Cheatsheet
No ratings yet
Python String Methods - Cheatsheet
7 pages
Working With Text Data in R
No ratings yet
Working With Text Data in R
1 page
Python-Codebook - Code of Geeks - by-COG - Compressed PDF
No ratings yet
Python-Codebook - Code of Geeks - by-COG - Compressed PDF
17 pages
Data Structures and Strings in Python Dark Mode
No ratings yet
Data Structures and Strings in Python Dark Mode
22 pages
DAP 2 Module
No ratings yet
DAP 2 Module
83 pages
Python String Functions
No ratings yet
Python String Functions
15 pages
Strings
No ratings yet
Strings
57 pages
Strings in Python Complete
No ratings yet
Strings in Python Complete
45 pages
Unit 3
No ratings yet
Unit 3
124 pages
Strings: Built-In Functions
No ratings yet
Strings: Built-In Functions
6 pages
Python String
No ratings yet
Python String
4 pages
List Coomprehensions
No ratings yet
List Coomprehensions
24 pages
String Python
No ratings yet
String Python
8 pages
Python Strings
No ratings yet
Python Strings
35 pages
String Method Functions
No ratings yet
String Method Functions
18 pages
Python Unit 3
No ratings yet
Python Unit 3
29 pages
Python Codebook by COG Updated - Compressed 1 PDF
No ratings yet
Python Codebook by COG Updated - Compressed 1 PDF
19 pages
String and Text Processing
No ratings yet
String and Text Processing
8 pages
Print
No ratings yet
Print
5 pages
Dap M2-1
No ratings yet
Dap M2-1
83 pages
Python G String Fun Cs
No ratings yet
Python G String Fun Cs
5 pages
13 BSC CS Python CHP 4
No ratings yet
13 BSC CS Python CHP 4
12 pages
Python Strings
No ratings yet
Python Strings
10 pages
Strings Methods Fully
No ratings yet
Strings Methods Fully
20 pages
Python Basics Notes by Ahmed Naeim
No ratings yet
Python Basics Notes by Ahmed Naeim
53 pages
III Ai&Ds Exp 7-12
No ratings yet
III Ai&Ds Exp 7-12
8 pages
String Functions and Regular Expressions: Anastasis Oulas Evangelos Pafilis Jacques Lagnel
No ratings yet
String Functions and Regular Expressions: Anastasis Oulas Evangelos Pafilis Jacques Lagnel
37 pages
Ch-10 (String Manipulation)
No ratings yet
Ch-10 (String Manipulation)
5 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
Python Strings Comprehensive Guide
No ratings yet
Python Strings Comprehensive Guide
5 pages
Strings and Characters
No ratings yet
Strings and Characters
24 pages
Compound Datatype Operators and Functions
No ratings yet
Compound Datatype Operators and Functions
7 pages
7 String
No ratings yet
7 String
20 pages
CH 4 Strings - Python
No ratings yet
CH 4 Strings - Python
25 pages
Unit4 Part1
No ratings yet
Unit4 Part1
20 pages
PFSD stringFuncLIST
No ratings yet
PFSD stringFuncLIST
105 pages
Unit 3
No ratings yet
Unit 3
27 pages
PPT
No ratings yet
PPT
3 pages
CH08
No ratings yet
CH08
16 pages
UNIT4
No ratings yet
UNIT4
67 pages
Unit-2 ch-9 Strings
No ratings yet
Unit-2 ch-9 Strings
25 pages
String
No ratings yet
String
7 pages
String Manipulation in Python
No ratings yet
String Manipulation in Python
15 pages
Advanced Python Programming Practical Manual
No ratings yet
Advanced Python Programming Practical Manual
29 pages
Lecture 7 Re Part2 Split
No ratings yet
Lecture 7 Re Part2 Split
8 pages
String Operators & Method
No ratings yet
String Operators & Method
31 pages
SDFSD 5
No ratings yet
SDFSD 5
18 pages
Python Unit 2
No ratings yet
Python Unit 2
23 pages
Python String Processing Cheatsheet KDnuggets
No ratings yet
Python String Processing Cheatsheet KDnuggets
1 page
6 Strings11
No ratings yet
6 Strings11
14 pages
STRINGS
No ratings yet
STRINGS
1 page
Python Notes
No ratings yet
Python Notes
13 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Jquery Mock Test
No ratings yet
Jquery Mock Test
6 pages
Proof Assistants: History, Ideas and Future: H Geuvers
No ratings yet
Proof Assistants: History, Ideas and Future: H Geuvers
23 pages
Java Execution Chart
No ratings yet
Java Execution Chart
4 pages
iSitePower-M APP Quick Guide 2
No ratings yet
iSitePower-M APP Quick Guide 2
16 pages
Asset Management Portal
No ratings yet
Asset Management Portal
41 pages
HTML Unit
No ratings yet
HTML Unit
3 pages
ESF7 Orientation October 2023
No ratings yet
ESF7 Orientation October 2023
46 pages
Dragos Platform Datasheet
No ratings yet
Dragos Platform Datasheet
8 pages
MQ Nodes
No ratings yet
MQ Nodes
4 pages
Beyodtrust - PAM For Windows
No ratings yet
Beyodtrust - PAM For Windows
3 pages
11 Hybris Personalization
No ratings yet
11 Hybris Personalization
20 pages
Terraform - Self Notes
No ratings yet
Terraform - Self Notes
25 pages
Authorization Troubleshooting: Account: Authorization For Company Codes Requires The Specification of Two Field Values
No ratings yet
Authorization Troubleshooting: Account: Authorization For Company Codes Requires The Specification of Two Field Values
5 pages
mn060 - Extracting Open and Closed Pore Networks For Analysis and Visualisation
No ratings yet
mn060 - Extracting Open and Closed Pore Networks For Analysis and Visualisation
13 pages
Smart Medicine Box
No ratings yet
Smart Medicine Box
15 pages
1 s2.0 S0045790620307497 Main
No ratings yet
1 s2.0 S0045790620307497 Main
13 pages
Quarter 2 Week 3 Argumentative Essay 1 Modals
No ratings yet
Quarter 2 Week 3 Argumentative Essay 1 Modals
34 pages
Rizwan Zaheer Resume
No ratings yet
Rizwan Zaheer Resume
1 page
Cracking The AI Code
No ratings yet
Cracking The AI Code
46 pages
Frequency and Relative Frequency Distributions
No ratings yet
Frequency and Relative Frequency Distributions
7 pages
(Ebook) HTML & JavaScript basics by Karl Barksdale; E Shane Turner ISBN 9780538742351, 0538742356 Ready to Read
No ratings yet
(Ebook) HTML & JavaScript basics by Karl Barksdale; E Shane Turner ISBN 9780538742351, 0538742356 Ready to Read
168 pages
Full-Stack Offline App Development
No ratings yet
Full-Stack Offline App Development
60 pages
Fbphaser Manual 2 0
No ratings yet
Fbphaser Manual 2 0
4 pages
BMMS Client Operation Manual V3.3 PDF
No ratings yet
BMMS Client Operation Manual V3.3 PDF
60 pages
Presentation On Sudoku
50% (2)
Presentation On Sudoku
11 pages
Lecture 03
No ratings yet
Lecture 03
44 pages

Working With Text Data in Python

Uploaded by

Working With Text Data in Python

Uploaded by

Working with text > Formatting settings > Detecting Matches

data in Python # Generate an example DataFramed named df

df = pd.DataFrame({"x": [0.123, 4.567, 8.901]})

# Detect if a regex pattern is present in strings with .str.contains()

suits.str.contains("[ae]") # False True True True

# Visualize and format table output

- x The output of style.format

suits.str.findall(".[ae]") # [] ["ia"] ["he"[ ["pa", "de"]

suits = pd.Series(["clubs", "Diamonds", "hearts", "Spades"])

rock_paper_scissors = pd.Series(["rock ", " paper", "scissors"]) # 3 0 a d

# Split strings into list of characters with .str.split(pat="")

# Get subset of strings that match with x[x.str.contains()]

String lengths and substrings

> # [, "D" "i" "a" "m" "o" "n" "d" "s", ]

# [, "h" "e" "a" "r" "t" "s", ]

# [, "S" "p" "a" "d" "e" "s", ]

# Get the number of characters with .str.len()

# Split strings by a separator with .str.split()

> Replacing matches

# Replace a regex match with another string with .str.replace()

suits.str[2:5] # Returns "ubs" "amo" "art" "ade"

suits.str.replace("a", "4") # "clubs" "Di4monds" "he4rts" "Sp4des"

# Get substrings by negative position with .str[]

# Remove a suffix with .str.removesuffix()

suits.str[:-3] # "cl" "Diamo" "hea" "Spa

suits.str.removesuffix # "club" "Diamond" "heart" "Spade"

# Remove whitespace from the start/end with .str.strip()

rock_paper_scissors.str.strip() # "rock" "paper" "scissors"

suits.str.split(pat = "a", expand=True)

rhymes.str.slice_replace(0, 1, "r") # "rein" "rain" "reign"

# Pad strings to a given length with .str.pad()

suits.str.pad(8, fillchar="_") # "___clubs" "Diamonds" "__hearts" "__Spades" # 0 clubs None

> Changing case

suits.str.upper() # "CLUBS" "DIAMONDS" "HEARTS" "SPADES"

suits + "5" # "clubs5" "Diamonds5" "hearts5" "Spades5"

# Convert to title case with .str.title()

pd.Series("hello, world!").str.title() # "Hello, World!"

suits.str.cat(sep=", ") # "clubs, Diamonds, hearts, Spades"

# Convert to sentence case with .str.capitalize()

pd.Series("hello, world!").str.capitalize() # "Hello, world!" suits * 2 # "clubsclubs" "DiamondsDiamonds" "heartshearts" "SpadesSpades"

You might also like

Working with text  > Formatting settings > Detecting Matches

suits.str.split(pat = "a", expand=True) 

suits.str.pad(8, fillchar="_") # "_clubs" "Diamonds" "hearts" "__Spades" # 0 clubs None