Java Controlling Case in Regular Expressions - Supercoders | Web Development and Design | Tutorial for Java, PHP, HTML, Javascript Java Controlling Case in Regular Expressions - Supercoders | Web Development and Design | Tutorial for Java, PHP, HTML, Javascript

Breaking

Post Top Ad

Post Top Ad

Saturday, December 29, 2018

Java Controlling Case in Regular Expressions

Java Controlling Case in Regular Expressions

Problem

You want to find text regardless of case.

Solution

Compile the Pattern passing in the flags argument Pattern.CASE_INSENSITIVE to
indicate that matching should be case-independent (“fold” or ignore differences in
case). If your code might run in different locales (see Chapter 15), add Pattern.
UNICODE_CASE . Without these flags, the default is normal, case-sensitive matching
behavior. This flag (and others) are passed to the Pattern.compile() method, as in:

// CaseMatch.java
Pattern reCaseInsens = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE |
Pattern.UNICODE_CASE);
reCaseInsens.matches(input);
// will match case-insensitively

This flag must be passed when you create the Pattern ; as Pattern objects are immu- table, they cannot be changed once constructed. The full source code for this example is online as CaseMatch.java.

Pattern.compile( ) Flags 

Half a dozen flags can be passed as the second argument to Pattern.compile( ) . If more than one value is needed, they can be or’d together using the | bitwise or operator. In alphabetical order, the flags are:

CANON_EQ 

Enables so-called “canonical equivalence,” that is, characters are matched by their base character, so that the character e followed by the “combining character mark” for the acute accent ( ́ ) can be matched either by the composite character é or the letter e followed by the character mark for the accent (see Recipe 4.8). 

CASE_INSENSITIVE 

Turns on case-insensitive matching (see Recipe 4.7). 

COMMENTS 

Causes whitespace and comments (from # to end-of-line) to be ignored in the pattern. 

DOTALL 

Allows dot ( . ) to match any regular character or the newline, not just newline (see Recipe 4.9). 

MULTILINE 

Specifies multiline mode (see Recipe 4.9). 

UNICODE_CASE 

Enables Unicode-aware case folding (see Recipe 4.7). 

UNIX_LINES 

Makes \n the only valid “newline” sequence for MULTILINE mode (see Recipe 4.9).

No comments:

Post a Comment

Post Top Ad