Web Systems Security
CY430
Lecture 5: Server Scripting
PHP
part II
1
String Processing with Regular Expressions
• Text manipulation is usually done with regular
expressions
• They are a series of characters that serve as pattern-
matching templates (or search criteria) in strings, text
files and databases.
• Function preg_match uses regular expressions to search
a string for a specified pattern.
2
String Processing with Regular Expressions
$search = "Now is the time";
if(preg_match( "/Now/", $search ) )
print( "<p>'Now' was found.</p>" );
• Call preg_match function to search for pattern 'Now' in
variable $search.
• preg_match returns 1 if a match was found, 0 if no
matches were found
• If the pattern is found, it is evaluated to true in a boolean
context.
• If the pattern is not found, it is evaluated to false
3
Exmaple
<?php
$text = "The quick brown fox jumps over the lazy dog";
// Pattern to match any occurrence of the words "quick",
"fox", or "dog"
$pattern = '/\b(quick|fox|dog)\b/i';
if (preg_match($pattern, $text)) {
echo "Match found”
} else {
echo "No match found";
}
?>
4
Character Classes
• Square brackets surrounding a pattern of
characters are called a character class e.g.
[abc].
• A character class always matches a single
character out of a list of specified characters
5
Example
<?php
$text = "He was eating cake in the cafe.";
$pattern = "/ca[kf]e/";
if(preg_match($pattern, $text)){
echo "Match found!";
} else{
echo "Match not found.";}
?>
6
Character Classes
RegExp What it Does
[abc] Matches any one of the characters a, b, or c.
[^abc] Matches any one character other than a, b, or c.
[a-z] Matches any one character from lowercase a to
lowercase z.
[A-Z] Matches any one character from uppercase a to
uppercase z.
[a-zA-Z] Matches any one character from lowercase a to
uppercase Z.
[0-9] Matches a single digit between 0 and 9.
[a-z0-9] Matches a single character between a and z or
between 0 and 9.
7
Example
$strings = array('apple', 'Banana', 'cherry', '12345', '$
%^&*', 'APPLE’);
$pattern = '/[a-zA-Z]/';
foreach ($strings as $string) {
if (preg_match($pattern, $string))
{
echo "The string '$string' contains at least one
letter.</br>";
} else {
echo "The string '$string' does not contain any
letters.</br>";}
}
?>
8
Character classes (cont.)
Sr.No Expression & Description
1 [[:alpha:]]
It matches any string containing alphabetic characters aA through zZ.
2 [[:digit:]]
It matches any string containing numerical digits 0 through 9.
3 [[:alnum:]]
It matches any string containing alphanumeric characters aA through zZ and 0
through 9.
4 [[:space:]]
It matches any string containing a space.
9
Repetition Quantifiers
• In the previous section we've learned how to
match a single character in a variety of
fashions. But what if you want to match on
more than one character?
10
Repetition Quantifiers
RegExp What it Does
p+ Matches one or more occurrences of the letter p.
p* Matches zero or more occurrences of the letter p.
p? Matches zero or one occurrence of the letter p.
p{2} Matches exactly two occurrences of the letter p.
p{2,3} Matches at least two occurrences of the letter p, but not more than three
occurrences of the letter p.
p{2,} Matches two or more occurrences of the letter p.
p{,3} Matches at most three occurrences of the letter p
11
Pattern Modifiers
• A pattern modifier allows used to control the
way a pattern match is handled. Pattern
modifiers are placed directly after the regular
expression,
Modifier What it Does
i Makes the match case-insensitive manner.
12
String Processing with Regular Expressions
• Function preg_match takes two arguments, a regular-expression
pattern to search for and the string to search.
• The optional third argument to function preg_match is an array
that stores matches to the regular expression.
• The regular expression must be enclosed in delimiters typically a
forward slash (/) is placed at the beginning and end of the regular-
expression pattern.
$a=“windoW"; $a=“%8windoW"; $a="ow";
preg_match( "/\b([a-zA-Z]*ow)\b/i", $a, $x )
• Searches for any word ending in 'ow'
• The "i" after the pattern delimiter indicates a case-insensitive
search
13
Example
<?php
$search = "windows is the time";
if(preg_match("/\b([a-zA-Z]*ow)\b/i",
$search))
print( "<p> was found.</p>" );
else print("not found");
?>
14
Example
$search = "My Application";
if(preg_match( "/[m]/i", $search,
$match ) ){
print( " was found." );
print($match[0]);} //it will print "M"
else
print( " not found." );
15
Word Boundaries
• A word boundary character ( \b) helps you search for
the words that begins and/or ends with a pattern.
• For example, the regexp /\bcar/ matches the words
beginning with the pattern car, and would match
cart, carrot, or cartoon, but would not match oscar.
• Similarly, the regexp /car\b/ matches the words
ending with the pattern car, and would match scar,
oscar, or supercar, but would not match cart.
16
String Processing with Regular Expressions
The pattern , /\b(t[[:alpha:]]+)\b/i, matches any word
beginning with the character t followed by one or more
letters.
Would match : The , tea, triangle
Not match: t, the5
Try with: /\b (t[:alpha:]+)\b/i, // be careful about the space
The pattern uses the character class [:alpha:] to recognize
any letter—this is equivalent to the [a-zA-Z].
17
Examples
• He?llo would match: Hllo and Hello
• (He)?llo would match: llo and Hello
• Hello+ would match: Hello, Helloooooooo but not Hell
• Abc{2} would match: Abcc
• gr[^io]s would match: gras but not gris, gros , graas
• Bla(bla)* Bla, Blablabla
18
Example
$search = "grabs is the time";
if(preg_match( "/gr[^io]{2}s/i", $search ,
$x) )
print( "<p> was found. $x[0]</p>" );
else print("not found");
19