Java Finding the Matching Text - Supercoders | Web Development and Design | Tutorial for Java, PHP, HTML, Javascript Java Finding the Matching Text - Supercoders | Web Development and Design | Tutorial for Java, PHP, HTML, Javascript

Breaking

Post Top Ad

Post Top Ad

Saturday, December 29, 2018

Java Finding the Matching Text

Java Finding the Matching Text


Problem

You need to find the text that the regex matched.

Solution

Sometimes you need to know more than just whether a regex matched a string. In
editors and many other tools, you want to know exactly what characters were
matched. Remember that with multipliers such as * , the length of the text that was
matched may have no relationship to the length of the pattern that matched it. Do
not underestimate the mighty .* , which happily matches thousands or millions of
characters if allowed to. As you saw in the previous recipe, you can find out whether
a given match succeeds just by using find( ) or matches( ) . But in other applications,
you will want to get the characters that the pattern matched.

After a successful call to one of the above methods, you can use these “information”
methods to get information on the match:

start(), end( )
Returns the character position in the string of the starting and ending characters
that matched.

groupCount( )
Returns the number of parenthesized capture groups if any; returns 0 if no
groups were used.

group(int i)
Returns the characters matched by group i of the current match, if i is less than
or equal to the return value of groupCount( ) . Group 0 is the entire match, so
group(0) (or just group( ) ) returns the entire portion of the string that matched.

The notion of parentheses or “capture groups” is central to regex processing. Regexes
may be nested to any level of complexity. The group(int) method lets you retrieve
the characters that matched a given parenthesis group. If you haven’t used any
explicit parens, you can just treat whatever matched as “level zero.” For example:

// Part of REmatch.java
String patt = "Q[^u]\\d+\\.";
Pattern r = Pattern.compile(patt);
String line = "Order QT300. Now!";
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println(patt + " matches \"" +
m.group(0) +
"\" in \"" + line + "\"");
} else {
System.out.println("NO MATCH");
}

When run, this prints:

Q[^u]\d+\. matches "QT300." in "Order QT300. Now!"

It is also possible to get the starting and ending indexes and the length of the text that the pattern matched (remember that terms with multipliers, such as the \d+ in this example, can match an arbitrary number of characters in the string). You can use these in conjunction with the String.substring( ) methods as follows:

// Part of regexsubstr.java -- Prints exactly the same as REmatch.java
Pattern r = Pattern.compile(patt);
String line = "Order QT300. Now!";
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println(patt + " matches \"" +
line.substring(m.start(0), m.end(0)) +
"\" in \"" + line + "\"");
} else {
System.out.println("NO MATCH");
}
}

Suppose you need to extract several items from a string. If the input is:

Smith, John
Adams, John Quincy

and you want to get out:

John Smith
John Quincy Adams

just use:

// from REmatchTwoFields.java
// Construct a regex with parens to "grab" both field1 and field2
Pattern r = Pattern.compile("(.*), (.*)");
Matcher m = r.matcher(inputLine);
if (!m.matches( ))
throw new IllegalArgumentException("Bad input: " + inputLine);
System.out.println(m.group(2) + ' ' + m.group(1));

No comments:

Post a Comment

Post Top Ad