Regular Expressions in Java (Notes)
These are some notes about regular expressions in Java. This is not a lesson explaining every detail from the beginning.
Translating Perl -Idioms to Java
The following Perl -Script matches all the separated segments, where segments are separated by a colon, when the colon is not preceded by a question mark.
main.pl
while( "123:456?:789" =~ /(.*?)(?:(?<!\?):|$)/g )
{ print $1, "\n"; }System.out
123
456?:789
In Java the source string is translated to a matcher, the pattern to a pattern and the "=~" with the global flag "g" to a find operation of the matcher.
Main.java
public class Main
{ public static void main( final java.lang.String[] args )
{ final java.util.regex.Pattern p =
java.util.regex.Pattern.compile( "(.*?)(?:(?<!\\?):|$)" );
final java.util.regex.Matcher m = p.matcher( "123:456?:789" );
while( m.find() )java.lang.System.out.println( m.group( 1 )); }}System.out
123
456?:789
Just replacing all occurrences of a pattern by another pattern can already be accomplished by simply using »java.lang.String#replaceAll«, which accepts a pattern as its first argument and allows groups to be referenced with a dollar sign and the group number in its second argument.
Cancellation of the Special Meaning of Special Characters
Some characters do not represent themselves within a regular expression but have a special meaning. The following program shows, how the static operation »java.util.regex.Pattern.quote« can be used to cancel such a special meaning.
Main.java
class String
{ final java.lang.String string;
public String( final java.lang.String string )
{ this.string = string; }
public java.lang.String[] split
( final java.lang.String text )
{ return this.string.split
( java.util.regex.Pattern.quote( text )); }} public class Main
{ public static void main( final java.lang.String[] args )
{ final String text = new String( "alpha|beta" );
final java.lang.String[] strings = text.split( "|" );
for( java.lang.String string : strings )
java.lang.System.out.println( string ); }}System.out
alpha
beta
The following example shows how to cancel special meanings within the pattern and the replacement text of »java.lang.String#replaceAll«. However, this is not really needed here, because the operation »java.lang.String#replace« that does not use regular expressions could be used instead.
Main.java
class String
{ final java.lang.String string;
public String( final java.lang.String string )
{ this.string = string; }
public java.lang.String replaceAll
( final java.lang.String search,
final java.lang.String replace )
{ return this.string.replaceAll
( java.util.regex.Pattern.quote( search ),
java.util.regex.Matcher.quoteReplacement( replace )); }}
public class Main
{ public static void main( final java.lang.String[] args )
{ java.lang.System.out.println
( new String( "alpha's" ).replaceAll( "'", "\\'" )); }}System.out
alpha\'s
Negations
The following example shows how to match all texts, that do not end in the text ».dwg«. (The pattern used is based on a comment by Ralf Ullrich .)
Main.java
class Test
{ final java.lang.String pattern;
public Test( final java.lang.String pattern )
{ this.pattern = pattern; }
public void text( final java.lang.String text )
{ java.lang.System.out.println
( text + " " +( text.matches( this.pattern ) ? "=" : "!" )+
"~ " + this.pattern ); }}
public class Main
{ public static void main( final java.lang.String[] args )
{ Test test = new Test( ".*(?<!\\.dwg)" );
test.text( "abce.dwg" );
test.text( "abce.dwx" );
test.text( "abce.xwg" );
test.text( "ab.cd.fg" );
test.text( "abc.dwga" );
test.text( "abcdefgh" ); }}System.out
abce.dwg !~ .*(?<!\.dwg)
abce.dwx =~ .*(?<!\.dwg)
abce.xwg =~ .*(?<!\.dwg)
ab.cd.fg =~ .*(?<!\.dwg)
abc.dwga =~ .*(?<!\.dwg)
abcdefgh =~ .*(?<!\.dwg)