PHP Regular Expressions
Reading Records with a Pattern Separator
Problem
You want to read in records from a file, in which each record is separated by a pattern you can match with a regular expression.
Solution
Read the entire file into a string and then split on the regular expression:
$contents = file_get_contents('/path/to/your/file.txt');
$records = preg_split('/[0-9]+\) /', $contents);
Discussion
This breaks apart a numbered list and places the individual list items into array elements. So if you have a list like this:
1) Gödel
2) Escher
3) Bach
you end up with a four-element array, with an empty opening element. That’s because preg_split() assumes the delimiters are between items, but in this case, the numbers are before items:
array(4) {
[0]=>
string(0) ""
[1]=>
string(7) "Gödel
"
[2]=>
string(7) "Escher
"
[3]=>
string(5) "Bach
"
}
From one point of view, this can be a feature, not a bug, because the nth element holds the nth item. But, to compact the array, you can eliminate the first element:
$records = preg_split('/[0-9]+\) /', $contents);
array_shift($records);
Another modification you might want is to strip newlines from the elements and substitute the empty string instead:
$records = preg_split('/[0-9]+\) /', str_replace("\n",'',$contents));
array_shift($records);
PHP doesn’t allow you to change the input record separator to anything other than a newline, so this technique is also useful for breaking apart records divided by strings. However, if you find yourself splitting on a string instead of a regular expression, substitute explode() for preg_split() for a more efficient operation.
No comments:
Post a Comment