Java Matching Newlines in Text
You need to match newlines in text.
Solution
Use \n or \r .
See also the flags constant Pattern.MULTILINE , which makes newlines match as begin-
ning-of-line and end-of-line ( ^ and $ ).
Explained
While line-oriented tools from Unix such as sed and grep match regular expressions
one line at a time, not all tools do. The sam text editor from Bell Laboratories was
the first interactive tool I know of to allow multiline regular expressions; the Perl
scripting language followed shortly. In the Java API, the newline character by default
has no special significance. The BufferedReader method readLine( ) normally strips
out whichever newline characters it finds. If you read in gobs of characters using
some method other than readLine( ) , you may have some number of \n , \r , or \r\n
sequences in your text string. * Normally all of these are treated as equivalent to \n . If
you want only \n to match, use the UNIX_LINES flag to the Pattern.compile( )
method.
In Unix, ^ and $ are commonly used to match the beginning or end of a line, respec-
tively. In this API, the regex metacharacters ^ and $ ignore line terminators and only
match at the beginning and the end, respectively, of the entire string. However, if
you pass the MULTILINE flag into Pattern.compile( ) , these expressions match just
after or just before, respectively, a line terminator; $ also matches the very end of the
string. Since the line ending is just an ordinary character, you can match it with . or
similar expressions, and, if you want to know exactly where it is, \n or \r in the pat-
tern match it as well. In other words, to this API, a newline character is just another
character with no special significance. See the sidebar “Pattern.compile( ) Flags”. An
example of newline matching is shown in Example 4-6.
Example 4-6. NLMatch.java import java.util.regex.*; /** * Show line ending matching using regex class. * @author Ian F. Darwin, ian@darwinsys.com * @version $Id: ch04,v 1.4 2004/05/04 20:11:27 ian Exp $ */ public class NLMatch { public static void main(String[] argv) { String input = "I dream of engines\nmore engines, all day long"; System.out.println("INPUT: " + input); System.out.println( );
Example 4-6. NLMatch.java (continued) String[] patt = { "engines.more engines", "engines$" }; for (int i = 0; i < patt.length; i++) { System.out.println("PATTERN " + patt[i]); boolean found; Pattern p1l = Pattern.compile(patt[i]); found = p1l.matcher(input).find( ); System.out.println("DEFAULT match " + found); Pattern pml = Pattern.compile(patt[i], Pattern.DOTALL|Pattern.MULTILINE); found = pml.matcher(input).find( ); System.out.println("MultiLine match " + found); System.out.println( ); } } }
If you run this code, the first pattern (with the wildcard character . ) always matches, while the second pattern (with $ ) matches only when MATCH_MULTILINE is set.
> java NLMatch INPUT: I dream of engines more engines, all day long PATTERN engines more engines DEFAULT match true MULTILINE match: true PATTERN engines$ DEFAULT match false MULTILINE match: true
No comments:
Post a Comment