JAVA - REGULAR EXPRESSIONS
http://www.tutorialspoint.com/java/java_regular_expressions.htm Copyright © tutorials point.com
Java provides t he java.ut il.regex package for pat t ern mat ching wit h regular expressions. Java regular
expressions are very similar t o t he Perl programming language and very easy t o learn.
A regular expression is a special sequence of charact ers t hat helps you mat ch or find ot her st rings or
set s of st rings, using a specialized synt ax held in a pat t ern. They can be used t o search, edit , or
manipulat e t ext and dat a.
The java.ut il.regex package primarily consist s of t he following t hree classes:
Pattern Class: A Pat t ern object is a compiled represent at ion of a regular expression. The
Pat t ern class provides no public const ruct ors. To creat e a pat t ern, you must first invoke one
of it s public st at ic compile met hods, which will t hen ret urn a Pat t ern object . These met hods
accept a regular expression as t he first argument .
Matcher Class: A Mat cher object is t he engine t hat int erpret s t he pat t ern and performs
mat ch operat ions against an input st ring. Like t he Pat t ern class, Mat cher defines no public
const ruct ors. You obt ain a Mat cher object by invoking t he mat cher met hod on a Pat t ern
object .
PatternSyntaxExceptio n: A Pat t ernSynt axExcept ion object is an unchecked except ion
t hat indicat es a synt ax error in a regular expression pat t ern.
Capt uring Groups:
Capt uring groups are a way t o t reat mult iple charact ers as a single unit . They are creat ed by placing
t he charact ers t o be grouped inside a set of parent heses. For example, t he regular expression (dog)
creat es a single group cont aining t he let t ers "d", "o", and "g".
Capt uring groups are numbered by count ing t heir opening parent heses from left t o right . In t he
expression ((A)(B(C))), for example, t here are four such groups:
((A)(B(C)))
(A)
(B(C))
(C)
To find out how many groups are present in t he expression, call t he groupCount met hod on a
mat cher object . The groupCount met hod ret urns an int showing t he number of capt uring groups
present in t he mat cher's pat t ern.
There is also a special group, group 0, which always represent s t he ent ire expression. This group is
not included in t he t ot al report ed by groupCount .
Example:
Following example illust rat es how t o find a digit st ring from t he given alphanumeric st ring:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
} else {
System.out.println("NO MATCH");
}
}
}
This would produce t he following result :
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0
Regular Expression Synt ax:
Here is t he t able list ing down all t he regular expression met acharact er synt ax available in Java:
Subexpressio n Matches
^ Mat ches beginning of line.
$ Mat ches end of line.
. Mat ches any single charact er except newline. Using m opt ion allows it t o
mat ch newline as well.
[...] Mat ches any single charact er in bracket s.
[^...] Mat ches any single charact er not in bracket s
\A Beginning of ent ire st ring
\z End of ent ire st ring
\Z End of ent ire st ring except allowable final line t erminat or.
re* Mat ches 0 or more occurrences of preceding expression.
re+ Mat ches 1 or more of t he previous t hing
re? Mat ches 0 or 1 occurrence of preceding expression.
re{ n} Mat ches exact ly n number of occurrences of preceding expression.
re{ n,} Mat ches n or more occurrences of preceding expression.
re{ n, m} Mat ches at least n and at most m occurrences of preceding expression.
a| b Mat ches eit her a or b.
(re) Groups regular expressions and remembers mat ched t ext .
(?: re) Groups regular expressions wit hout remembering mat ched t ext .
(?> re) Mat ches independent pat t ern wit hout backt racking.
\w Mat ches word charact ers.
\W Mat ches nonword charact ers.
\s Mat ches whit espace. Equivalent t o [\t \n\r\f].
\S Mat ches nonwhit espace.
\d Mat ches digit s. Equivalent t o [0-9].
\D Mat ches nondigit s.
\A Mat ches beginning of st ring.
\Z Mat ches end of st ring. If a newline exist s, it mat ches just before newline.
\z Mat ches end of st ring.
\G Mat ches point where last mat ch finished.
\n Back-reference t o capt ure group number "n"
\b Mat ches word boundaries when out side bracket s. Mat ches backspace (0x08)
when inside bracket s.
\B Mat ches nonword boundaries.
\n, \t , et c. Mat ches newlines, carriage ret urns, t abs, et c.
\Q Escape (quot e) all charact ers up t o \E
\E Ends quot ing begun wit h \Q
Met hods of t he Mat cher Class:
Here is a list of useful inst ance met hods:
Index Met hods:
Index met hods provide useful index values t hat show precisely where t he mat ch was found in t he
input st ring:
SN Metho ds with Descriptio n
1 public int start()
Ret urns t he st art index of t he previous mat ch.
2 public int start(int gro up)
Ret urns t he st art index of t he subsequence capt ured by t he given group during t he previous
mat ch operat ion.
3 public int end()
Ret urns t he offset aft er t he last charact er mat ched.
4 public int end(int gro up)
Ret urns t he offset aft er t he last charact er of t he subsequence capt ured by t he given group
during t he previous mat ch operat ion.
St udy Met hods:
St udy met hods review t he input st ring and ret urn a Boolean indicat ing whet her or not t he pat t ern is
found:
SN Metho ds with Descriptio n
1 public bo o lean lo o kingAt()
At t empt s t o mat ch t he input sequence, st art ing at t he beginning of t he region, against t he
pat t ern.
2 public bo o lean find()
At t empt s t o find t he next subsequence of t he input sequence t hat mat ches t he pat t ern.
3 public bo o lean find(int start
Reset s t his mat cher and t hen at t empt s t o find t he next subsequence of t he input
sequence t hat mat ches t he pat t ern, st art ing at t he specified index.
4 public bo o lean matches()
At t empt s t o mat ch t he ent ire region against t he pat t ern.
Replacement Met hods:
Replacement met hods are useful met hods for replacing t ext in an input st ring:
SN Metho ds with Descriptio n
1 public Matcher appendReplacement(StringBuffer sb, String replacement)
Implement s a non-t erminal append-and-replace st ep.
2 public StringBuffer appendT ail(StringBuffer sb)
Implement s a t erminal append-and-replace st ep.
3 public String replaceAll(String replacement)
Replaces every subsequence of t he input sequence t hat mat ches t he pat t ern wit h t he
given replacement st ring.
4 public String replaceFirst(String replacement)
Replaces t he first subsequence of t he input sequence t hat mat ches t he pat t ern wit h t he
given replacement st ring.
5 public static String quo teReplacement(String s)
Ret urns a lit eral replacement St ring for t he specified St ring. This met hod produces a St ring
t hat will work as a lit eral replacement s in t he appendReplacement met hod of t he Mat cher
class.
The start and end Met hods:
Following is t he example t hat count s t he number of t imes t he word "cat s" appears in t he input
st ring:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
private static final String REGEX = "\\bcat\\b";
private static final String INPUT =
"cat cat cat cattie cat";
public static void main( String args[] ){
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
int count = 0;
while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
}
}
This would produce t he following result :
Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22
You can see t hat t his example uses word boundaries t o ensure t hat t he let t ers "c" "a" "t " are not
merely a subst ring in a longer word. It also gives some useful informat ion about where in t he input
st ring t he mat ch has occurred.
The st art met hod ret urns t he st art index of t he subsequence capt ured by t he given group during
t he previous mat ch operat ion, and end ret urns t he index of t he last charact er mat ched, plus one.
The matches and lookingAt Met hods:
The mat ches and lookingAt met hods bot h at t empt t o mat ch an input sequence against a pat t ern.
The difference, however, is t hat mat ches requires t he ent ire input sequence t o be mat ched, while
lookingAt does not .
Bot h met hods always st art at t he beginning of t he input st ring. Here is t he example explaining t he
funct ionalit y:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
private static final String REGEX = "foo";
private static final String INPUT = "fooooooooooooooooo";
private static Pattern pattern;
private static Matcher matcher;
public static void main( String args[] ){
pattern = Pattern.compile(REGEX);
matcher = pattern.matcher(INPUT);
System.out.println("Current REGEX is: "+REGEX);
System.out.println("Current INPUT is: "+INPUT);
System.out.println("lookingAt(): "+matcher.lookingAt());
System.out.println("matches(): "+matcher.matches());
}
}
This would produce t he following result :
Current REGEX is: foo
Current INPUT is: fooooooooooooooooo
lookingAt(): true
matches(): false
The replaceFirst and replaceAll Met hods:
The replaceFirst and replaceAll met hods replace t ext t hat mat ches a given regular expression. As
t heir names indicat e, replaceFirst replaces t he first occurrence, and replaceAll replaces all
occurrences.
Here is t he example explaining t he funct ionalit y:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
private static String REGEX = "dog";
private static String INPUT = "The dog says meow. " +
"All dogs say meow.";
private static String REPLACE = "cat";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REPLACE);
System.out.println(INPUT);
}
}
This would produce t he following result :
The cat says meow. All cats say meow.
The appendReplacement and appendTail Met hods:
The Mat cher class also provides appendReplacement and appendTail met hods for t ext replacement .
Here is t he example explaining t he funct ionalit y:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
private static String REGEX = "a*b";
private static String INPUT = "aabfooaabfooabfoob";
private static String REPLACE = "-";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
StringBuffer sb = new StringBuffer();
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
System.out.println(sb.toString());
}
}
This would produce t he following result :
-foo-foo-foo-
Pat t ernSynt axExcept ion Class Met hods:
A Pat t ernSynt axExcept ion is an unchecked except ion t hat indicat es a synt ax error in a regular
expression pat t ern. The Pat t ernSynt axExcept ion class provides t he following met hods t o help you
det ermine what went wrong:
SN Metho ds with Descriptio n
1 public String getDescriptio n()
Ret rieves t he descript ion of t he error.
2 public int getIndex()
Ret rieves t he error index.
3 public String getPattern()
Ret rieves t he erroneous regular expression pat t ern.
4 public String getMessage()
Ret urns a mult i-line st ring cont aining t he descript ion of t he synt ax error and it s index, t he
erroneous regular expression pat t ern, and a visual indicat ion of t he error index wit hin t he
pat t ern.