Monday, July 8, 2019

Home PHP php-preg_match-array-of-patterns php-preg_match-case-insensitive php-regex-cheat-sheet php-regular-expression-examples PHP Regular Expressions Matching Words

PHP Regular Expressions Matching Words

PHP Regular Expressions

Matching Words

Problem

You want to pull out all words from a string.

Solution

The simplest way to do this is to use the PCRE “word character” character type escape sequence, \w:

$text = "Knock, knock. Who's there? r2d2!";

$words = preg_match_all('/\w+/', $text, $matches);

var_dump($matches[0]);

Discussion

The \w escape sequence matches letters, digits, and underscores. It does not include other punctuation. So the output from the preceding code is:

array(6) {

[0]=>

string(5) "Knock"

[1]=>

string(5) "knock"

[2]=>

string(3) "Who"

[3]=>

string(1) "s"

[4]=>

string(5) "there"

[5]=>

string(4) "r2d2"

}

This is mostly correct except that Who’s is broken up into Who and s. To extend this pattern to handle English contractions properly, we can match against either a word character or an apostrophe sandwiched by word characters:

$text = "Knock, knock. Who's there? r2d2!";

$pattern = "/(?:\w'\w|\w)+/";

$words = preg_match_all($pattern, $text, $matches);

var_dump($matches[0]);

(The ?: syntax in this pattern prevents the text that matches the parenthesized subpattern from being “captured.”)

With the addition of the u modifier, a pattern becomes Unicode-aware and will handle words properly in non-ASCII character sets. For example:

$fr = 'Toc, toc. Qui est là? R2D2!';

$fr_words = preg_match_all('/\w+/u', $fr, $matches);

print "The French words are:\n\t";

print implode(', ', $matches[0]) . "\n";

$kr = '노크, 노크. 거기 누구입니까? R2D2!';

$kr_words = preg_match_all('/\w+/u', $kr, $matches);

print "The Korean words are:\n\t";

print implode(', ', $matches[0]) . "\n";

This prints:

The French words are:

Toc, toc, Qui, est, là, R2D2

The Korean words are:

노크, 노크, 거기, 누구입니까, R2D2

Without that u at the end of each pattern, the non-ASCII characters would be stripped out of the matches, producing incorrect results.

Breaking

Post Top Ad

Post Top Ad

Monday, July 8, 2019

PHP Regular Expressions Matching Words

No comments:

Post a Comment

Post Top Ad

Author Details

Subscribe Our Youtube Channel

Featured Post

Total Pageviews

Translate

Advertisement

Recent

Popular

Comments

Ads

Archive

Technology

Tags

Contact Form

Breaking

Post Top Ad

Post Top Ad

Monday, July 8, 2019

PHP Regular Expressions Matching Words

No comments:

Post a Comment

Post Top Ad

Author Details

Edit This Menu

Join Our Telegram Channel to Stay Updated

Socialize

Subscribe Our Youtube Channel

Featured Post

Total Pageviews

Translate

Advertisement

Recent

Popular

Comments

Ads

Archive

Technology

Tags

Contact Form