Java Matching Newlines in Text - Supercoders | Web Development and Design | Tutorial for Java, PHP, HTML, Javascript Java Matching Newlines in Text - Supercoders | Web Development and Design | Tutorial for Java, PHP, HTML, Javascript

Breaking

Post Top Ad

Post Top Ad

Saturday, December 29, 2018

Java Matching Newlines in Text

Java Matching Newlines in Text


Problem

You need to match newlines in text.

Solution

Use \n or \r .

See also the flags constant Pattern.MULTILINE , which makes newlines match as begin-
ning-of-line and end-of-line ( ^ and $ ).

Explained

While line-oriented tools from Unix such as sed and grep match regular expressions
one line at a time, not all tools do. The sam text editor from Bell Laboratories was
the first interactive tool I know of to allow multiline regular expressions; the Perl
scripting language followed shortly. In the Java API, the newline character by default
has no special significance. The BufferedReader method readLine( ) normally strips
out whichever newline characters it finds. If you read in gobs of characters using
some method other than readLine( ) , you may have some number of \n , \r , or \r\n
sequences in your text string. * Normally all of these are treated as equivalent to \n . If
you want only \n to match, use the UNIX_LINES flag to the Pattern.compile( )
method.

In Unix, ^ and $ are commonly used to match the beginning or end of a line, respec-
tively. In this API, the regex metacharacters ^ and $ ignore line terminators and only
match at the beginning and the end, respectively, of the entire string. However, if
you pass the MULTILINE flag into Pattern.compile( ) , these expressions match just
after or just before, respectively, a line terminator; $ also matches the very end of the
string. Since the line ending is just an ordinary character, you can match it with . or
similar expressions, and, if you want to know exactly where it is, \n or \r in the pat-
tern match it as well. In other words, to this API, a newline character is just another
character with no special significance. See the sidebar “Pattern.compile( ) Flags”. An
example of newline matching is shown in Example 4-6.

Example 4-6. NLMatch.java
import java.util.regex.*;
/**
* Show line ending matching using regex class.
* @author Ian F. Darwin, ian@darwinsys.com
* @version $Id: ch04,v 1.4 2004/05/04 20:11:27 ian Exp $
*/
public class NLMatch {
public static void main(String[] argv) {
String input = "I dream of engines\nmore engines, all day long";
System.out.println("INPUT: " + input);
System.out.println( );

Example 4-6. NLMatch.java (continued)
String[] patt = {
"engines.more engines",
"engines$"
};
for (int i = 0; i < patt.length; i++) {
System.out.println("PATTERN " + patt[i]);
boolean found;
Pattern p1l = Pattern.compile(patt[i]);
found = p1l.matcher(input).find( );
System.out.println("DEFAULT match " + found);
Pattern pml = Pattern.compile(patt[i],
Pattern.DOTALL|Pattern.MULTILINE);
found = pml.matcher(input).find( );
System.out.println("MultiLine match " + found);
System.out.println( );
}
}
}

If you run this code, the first pattern (with the wildcard character . ) always matches, while the second pattern (with $ ) matches only when MATCH_MULTILINE is set.

> java NLMatch
INPUT: I dream of engines
more engines, all day long
PATTERN engines
more engines
DEFAULT match true
MULTILINE match: true
PATTERN engines$
DEFAULT match false
MULTILINE match: true

No comments:

Post a Comment

Post Top Ad