Thursday, May 01, 2008

Regular Experssions and Java

What happens when you do the following –
Example 0)
String str = “someString”;
str.replaceAll(“*”,”#”);


The above snippet would replace all ‘*’ characters in the String str by ‘#’, simple, isn’t it? Well, if it’s so straight why is it here ;)

The replaceAll() API in String class has the following syntax – replaceAll(String regex, String replacement)

Yeah, I hope the term regex caught your attention. The first String is treated as a regular expression by Java and as a result the above snippet gives you a lovely error -

Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0

Lovely, because it uses the word dangling ;)
If you know regular expressions even a little the character ‘*’ has a special meaning – it means zero/more times. Let’s take some examples so that this is clear

Example 1)
String str = "the Blue Umbrella is bllue in collor";
System.out.println(str.replaceAll("ll*","?"));

Output: the B?ue Umbre?a is b?ue in co?or
Explanation: the regular expression ll* searches for strings {l, ll, lll,….} etc and the code replaces them with ?
Note: simple l* will search for {emptyString, l, ll, lll, ….}

Example 2)
String str = "the Blue Umbrella is blue in color";
System.out.println(str.replaceAll("rella.*","???"));
Output = the Blue Umb???
Explanation: ‘.’(dot) means any character, so .* would mean any character zero/more times. Hence everything from rella… gets replaced with ???


So in Example 0) to replace * with # you have to tell Java that do not take * in the regular expression term but as it is; so do a str.replaceAll(“\*”,”#”)….just escape your *….thats all!!!!