Monday, 4 August 2014

Pattern with empty regexp in Java

It was quite a surprise for me that Pattern.compile("") with empty regexp produces matches for any string, be it empty or non-empty.
    Pattern p = Pattern.compile("");
    Matcher m = p.matcher("abc");
    while (m.find()) {
        Log.i("Matcher start: ",  String.valueOf(m.start()));
        Log.i("Matcher end: ",  String.valueOf(m.end()));
    }

This listing will produce the following output:
I/Matcher start:﹕ 0
I/Matcher end:﹕ 0
I/Matcher start:﹕ 1
I/Matcher end:﹕ 1
I/Matcher start:﹕ 2
I/Matcher end:﹕ 2
I/Matcher start:﹕ 3
I/Matcher end:﹕ 3

Hmm... string "abc" contains 4 empty characters. Empty character starts and ends at the same position in a text in contrast to say character "a"; "a" starts at position 0 and ends at 1. String "" will also match this pattern, with both matcher.start() and matcher.end() equal to 0.

Contrast an empty character to an empty string.
    Pattern p = Pattern.compile("^$");

This pattern with an "empty string" regexp will not match the "abc" string but will match the "" string. The "" is an empty string that also contains an empty character.

There are a lot of ways to supply regexp that wouldn't much anything, for example the "$matchnothing" regexp, as $ denotes the end of the line and no character can exist thereafter.

No comments :

Post a Comment