Java Printing All Occurrences of a Pattern
ProblemYou need to find all the strings that match a given regex in one or more files or other
sources.
Solution
This example reads through a file one line at a time. Whenever a match is found, I
extract it from the line and print it.
This code takes the group( ) methods from Recipe 4.3, the substring method from
the CharacterIterator interface, and the match( ) method from the regex and simply
puts them all together. I coded it to extract all the “names” from a given file; in run-
ning the program through itself, it prints the words “import”, “java”, “until”,
“regex”, and so on:
> jikes +E -d . ReaderIter.java > java ReaderIter ReaderIter.java import java util regex import java io Print all the strings that match given pattern from file public
I interrupted it here to save paper. This can be written two ways, a traditional “line
at a time” pattern shown in Example 4-3 and a more compact form using “new I/O”
shown in Example 4-4
Example 4-3. ReaderIter.java import java.util.regex.*; import java.io.*; /** * Print all the strings that match a given pattern from a file. */ public class ReaderIter { public static void main(String[] args) throws IOException { // The regex pattern Pattern patt = Pattern.compile("[A-Za-z][a-z]+"); // A FileReader (see the I/O chapter) BufferedReader r = new BufferedReader(new FileReader(args[0])); // For each line of input, try matching in it. String line; while ((line = r.readLine( )) != null) { // For each match in the line, extract and print it. Matcher m = patt.matcher(line); while (m.find( )) { // Simplest method: // System.out.println(m.group(0)); // Get the starting position of the text int start = m.start(0); // Get ending position int end = m.end(0); // Print whatever matched. System.out.println("start=" + start + "; end=" + end); // Use CharSequence.substring(offset, end); System.out.println(line.substring(start, end)); } } } }
Example 4-4. GrepNIO.java import import import import import java.io.*; java.nio.*; java.nio.channels.*; java.nio.charset.*; java.util.regex.*; /* Grep-like program using NIO, but NOT LINE BASED. * Pattern and file name(s) must be on command line. */ public class GrepNIO { public static void main(String[] args) throws IOException { if (args.length < 2) { System.err.println("Usage: GrepNIO patt file [...]"); System.exit(1); } Pattern p = Pattern.compile(args[0]); for (int i=1; i<args.length; i++) process(p, args[i]); } static void process(Pattern pattern, String fileName) throws IOException { // Get a FileChannel from the given file. FileChannel fc = new FileInputStream(fileName).getChannel( ); // Map the file's content ByteBuffer buf = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size( )); // Decode ByteBuffer into CharBuffer CharBuffer cbuf = Charset.forName("ISO-8859-1").newDecoder( ).decode(buf); Matcher m = pattern.matcher(cbuf); while (m.find( )) { System.out.println(m.group(0)); } } }
The NIO version shown in Example 4-4 relies on the fact that an NIO Buffer can be
used as a CharSequence . This program is more general in that the pattern argument is
taken from the command-line argument. It prints the same output as the previous
example if invoked with the pattern argument from the previous program on the
command line:
java GrepNIO " [A-Za-z][a-z]+" ReaderIter.java
You might think of using \w+ as the pattern; the only difference is that my pattern
looks for well-formed capitalized words while \w+ would include Java-centric oddi-
ties like theVariableName , which have capitals in nonstandard positions.
Also note that the NIO version will probably be more efficient since it doesn’t reset
the Matcher to a new input source on each line of input as ReaderIter does.
No comments:
Post a Comment