Notes about regular expressions in Java. [] (Java, regular expression, regular expressions), notes, page 722122
https://www.purl.org/stefan_ram/pub/java-regular-expression-notes (permalink) is the canonical URI of this page.
Stefan Ram

Regular Expressions in Java  (Notes)

These are some notes about regular expressions in Java. This is not a lesson explaining every detail from the beginning.

Translating Perl -Idioms to Java

The following Perl -Script matches all the separated segments, where segments are separated by a colon, when the colon is not preceded by a question mark.

main.pl
while( "123:456?:789" =~ /(.*?)(?:(?<!\?):|$)/g )
{ print $1, "\n"; }
System.out
123
456?:789

In Java  the source string is translated to a matcher, the pattern to a pattern and the "=~" with the global flag "g" to a find operation of the matcher.

Main.java
public class Main
{ public static void main( final java.lang.String[] args )
{ final java.util.regex.Pattern p =
java.util.regex.Pattern.compile( "(.*?)(?:(?<!\\?):|$)" );
final java.util.regex.Matcher m = p.matcher( "123:456?:789" );
while( m.find() )java.lang.System.out.println( m.group( 1 )); }}
System.out
123
456?:789

Just replacing all occurrences of a pattern by another pattern can already be accomplished by simply using »java.lang.String#replaceAll«, which accepts a pattern as its first argument and allows groups to be referenced with a dollar sign and the group number in its second argument.

Cancellation of the Special Meaning of Special Characters

Some characters do not represent themselves within a regular expression but have a special meaning. The following program shows, how the static operation »java.util.regex.Pattern.quote« can be used to cancel such a special meaning.

Main.java
class String
{ final java.lang.String string;
public String( final java.lang.String string )
{ this.string = string; }
public java.lang.String[] split
( final java.lang.String text )
{ return this.string.split
( java.util.regex.Pattern.quote( text )); }} public class Main
{ public static void main( final java.lang.String[] args )
{ final String text = new String( "alpha|beta" );
final java.lang.String[] strings = text.split( "|" );
for( java.lang.String string : strings )
java.lang.System.out.println( string ); }}
System.out
alpha
beta

The following example shows how to cancel special meanings within the pattern and the replacement text of »java.lang.String#replaceAll«. However, this is not really needed here, because the operation »java.lang.String#replace« that does not use regular expressions could be used instead.

Main.java
class String
{ final java.lang.String string;
public String( final java.lang.String string )
{ this.string = string; }
public java.lang.String replaceAll
( final java.lang.String search,
final java.lang.String replace )
{ return this.string.replaceAll
( java.util.regex.Pattern.quote( search ),
java.util.regex.Matcher.quoteReplacement( replace )); }}

public class Main
{ public static void main( final java.lang.String[] args )
{ java.lang.System.out.println
( new String( "alpha's" ).replaceAll( "'", "\\'" )); }}
System.out
alpha\'s

Negations

The following example shows how to match all texts, that do not  end in the text ».dwg«. (The pattern used is based on a comment by Ralf Ullrich .)

Main.java
class Test
{ final java.lang.String pattern;
public Test( final java.lang.String pattern )
{ this.pattern = pattern; }
public void text( final java.lang.String text )
{ java.lang.System.out.println
( text + " " +( text.matches( this.pattern ) ? "=" : "!" )+
"~ " + this.pattern ); }}

public class Main
{ public static void main( final java.lang.String[] args )
{ Test test = new Test( ".*(?<!\\.dwg)" );
test.text( "abce.dwg" );
test.text( "abce.dwx" );
test.text( "abce.xwg" );
test.text( "ab.cd.fg" );
test.text( "abc.dwga" );
test.text( "abcdefgh" ); }}
System.out
abce.dwg !~ .*(?<!\.dwg)
abce.dwx =~ .*(?<!\.dwg)
abce.xwg =~ .*(?<!\.dwg)
ab.cd.fg =~ .*(?<!\.dwg)
abc.dwga =~ .*(?<!\.dwg)
abcdefgh =~ .*(?<!\.dwg)

About this page, Impressum  |   Form for messages to the publisher regarding this page  |   "ram@zedat.fu-berlin.de" (without the quotation marks) is the email-address of Stefan Ram.   |   Beginning at the start page often more information about the topics of this page can be found. (A link to the start page appears at the very top of this page.)  |   Copyright 2004 Stefan Ram, Berlin. All rights reserved. This page is a publication by Stefan Ram. slrprd, PbclevtugFgrsnaEnz