Java Controlling Case in Regular Expressions
ProblemYou want to find text regardless of case.
Solution
Compile the Pattern passing in the flags argument Pattern.CASE_INSENSITIVE to
indicate that matching should be case-independent (“fold” or ignore differences in
case). If your code might run in different locales (see Chapter 15), add Pattern.
UNICODE_CASE . Without these flags, the default is normal, case-sensitive matching
behavior. This flag (and others) are passed to the Pattern.compile() method, as in:
// CaseMatch.java Pattern reCaseInsens = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE); reCaseInsens.matches(input); // will match case-insensitively
This flag must be passed when you create the Pattern ; as Pattern objects are immu-
table, they cannot be changed once constructed.
The full source code for this example is online as CaseMatch.java.
Pattern.compile( ) Flags
Half a dozen flags can be passed as the second argument to Pattern.compile( ) . If more than one value is needed, they can be or’d together using the | bitwise or operator. In alphabetical order, the flags are:
CANON_EQ
Enables so-called “canonical equivalence,” that is, characters are matched by their base character, so that the character e followed by the “combining character mark” for the acute accent ( ́ ) can be matched either by the composite character é or the letter e followed by the character mark for the accent (see Recipe 4.8).
CASE_INSENSITIVE
Turns on case-insensitive matching (see Recipe 4.7).
COMMENTS
Causes whitespace and comments (from # to end-of-line) to be ignored in the pattern.
DOTALL
Allows dot ( . ) to match any regular character or the newline, not just newline (see Recipe 4.9).
MULTILINE
Specifies multiline mode (see Recipe 4.9).
UNICODE_CASE
Enables Unicode-aware case folding (see Recipe 4.7).
UNIX_LINES
Makes \n the only valid “newline” sequence for MULTILINE mode (see Recipe 4.9).
Half a dozen flags can be passed as the second argument to Pattern.compile( ) . If more than one value is needed, they can be or’d together using the | bitwise or operator. In alphabetical order, the flags are:
CANON_EQ
Enables so-called “canonical equivalence,” that is, characters are matched by their base character, so that the character e followed by the “combining character mark” for the acute accent ( ́ ) can be matched either by the composite character é or the letter e followed by the character mark for the accent (see Recipe 4.8).
CASE_INSENSITIVE
Turns on case-insensitive matching (see Recipe 4.7).
COMMENTS
Causes whitespace and comments (from # to end-of-line) to be ignored in the pattern.
DOTALL
Allows dot ( . ) to match any regular character or the newline, not just newline (see Recipe 4.9).
MULTILINE
Specifies multiline mode (see Recipe 4.9).
UNICODE_CASE
Enables Unicode-aware case folding (see Recipe 4.7).
UNIX_LINES
Makes \n the only valid “newline” sequence for MULTILINE mode (see Recipe 4.9).
No comments:
Post a Comment