PHP Performance Tuning
Avoiding Regular Expressions
Problem
You want to improve script performance by optimizing string-matching operations.
Solution
Replace unnecessary regular expression calls with faster string and character type function alternatives.
Discussion
A common source of unnecessary computation is the use of regular expression functions when they are not needed—for example, if you’re validating a form submission for a valid username and want to make sure that the username contains only alphanumeric characters.
A common approach to this problem is a regular expression:
if (!preg_match('/^[a-z0-9]+$/i', $username)) {
echo 'please enter a valid username.';
}
The same test can be performed much faster with the ctype_alnum() function.
Using code-timing techniques, let’s compare the preceding test with ctype_alnum():
$username = 'foo411';
$start = microtime(true);
if (!preg_match('/^[a-z0-9]+/i', $username)) {
echo 'please enter a valid username';
}
$regextime = microtime(true) - $start;
$start = microtime(true);
if (!ctype_alnum($username)) {
echo 'please enter a valid username';
}
$ctypetime = microtime(true) - $start;
echo "preg_match took: $regextime seconds\n";
echo "ctype_alnum took: $ctypetime seconds\n";
This will output results similar to:
preg_match took: 0.000163078308105 seconds
ctype_alnum took: 9.05990600586E-06 seconds
ctype_alnum() is considerably faster; 9.05990600586E-06 is the same as 0.00000906 seconds, which is 18 times faster than the preg_match() regular expression, with exactly the same result.
When applied to a complex application, replacing unnecessary regular expressions with equivalent alternatives can add up to a significant performance gain.
A good litmus test for using a regular expression (or not) is to see whether the match you’re performing can be explained in a brief sentence. Granted, there are some matches, such as “string is a valid email address,” which cannot be adequately verified without a complex regular expression. However, “check if string A contains string B” can be tested with several different approaches, but is ultimately a very simple test that does not require regular expressions:
$haystack = 'The quick brown fox jumps over the lazy dog';
$needle = 'lazy dog';
// slowest (and deprecated)
if (ereg($needle, $haystack)) echo 'match!';
// slow
if (preg_match("/$needle/", $haystack)) echo 'match!';
// fast
if (strstr($haystack, $needle)) echo 'match!';
// fastest
if (strpos($haystack, $needle) !== false) echo 'match!';
There is certainly a benefit to double-checking the ctype and string functions before making a commitment to a regular expression, particularly if you’re working a section of code that will loop repeatedly.
No comments:
Post a Comment