PHP Strings
Parsing Fixed-Width Field Data Records
Problem
You need to break apart fixed-width records in strings.Solution
Example Parsing fixed-width records with substr( )'
$fp = fopen('fixed-width-records.txt','r',true) or die ("can't open file");
while ($s = fgets($fp,1024)) {
$fields[1] = substr($s,0,25); // first field: first 25 characters of the line
$fields[2] = substr($s,25,15); // second field: next 15 characters of the line
$fields[3] = substr($s,40,4); // third field: next 4 characters of the line
$fields = array_map('rtrim', $fields); // strip the trailing whitespace
// a function to do something with the fields
process_fields($fields);
}
fclose($fp) or die("can't close file");
Example Parsing fixed-width records with unpack( )
function fixed_width_unpack($format_string,$data) {
$r = array();
for ($i = 0, $j = count($data); $i < $j; $i++) {
$r[$i] = unpack($format_string,$data[$i]);
}
return $r;
}
Discussion
Data in which each field is allotted a fixed number of characters per line may look like
this list of books, titles, and publication dates:
$booklist=<<<END
Elmer Gantry Sinclair Lewis 1927
The Scarlatti InheritanceRobert Ludlum 1971
The Parsifal Mosaic Robert Ludlum 1982
Sophie's Choice William Styron 1979
END;
In each line, the title occupies the first 25 characters, the author’s name the next 15 characters, and the publication year the next 4 characters. Knowing those field widths, you can easily use substr() to parse the fields into an array:
$books = explode("\n",$booklist);
for($i = 0, $j = count($books); $i < $j; $i++) {
$book_array[$i]['title'] = substr($books[$i],0,25);
$book_array[$i]['author'] = substr($books[$i],25,15);
$book_array[$i]['publication_year'] = substr($books[$i],40,4);
}
Exploding $booklist into an array of lines makes the looping code the same whether it’s operating over a string or a series of lines read in from a file.
Example fixed_width_substr( )
function fixed_width_substr($fields,$data) {
$r = array();
for ($i = 0, $j = count($data); $i < $j; $i++) {
$line_pos = 0;
foreach($fields as $field_name => $field_length) {
$r[$i][$field_name] = rtrim(substr($data[$i],$line_pos,$field_length));
$line_pos += $field_length;
}
}
return $r;
}
$book_fields = array('title' => 25,
'author' => 15,
'publication_year' => 4);
$book_array = fixed_width_substr($book_fields,$booklist);
The variable $line_pos keeps track of the start of each field and is advanced by the previous field’s width as the code moves through each line. Use rtrim() to remove trailing whitespace from each field.
Example fixed_width_unpack( )
function fixed_width_unpack($format_string,$data) {
$r = array();
for ($i = 0, $j = count($data); $i < $j; $i++) {
$r[$i] = unpack($format_string,$data[$i]);
}
return $r;
}
Because the A format to unpack() means space-padded string, there’s no need to rtrim() off the trailing spaces. Once the fields have been parsed into $book_array by either function, the data can be printed as an HTML table, for example:
$book_array = fixed_width_unpack('A25title/A15author/A4publication_year', $books);
print "<table>\n";
// print a header row
print '<tr><td>';
print join('</td><td>',array_keys($book_array[0]));
print "</td></tr>\n";
// print each data row
foreach ($book_array as $row) {
print '<tr><td>';
print join('</td><td>',array_values($row));
print "</td></tr>\n";
}
print "</table>\n";
Joining data on </td><td> produces a table row that is missing its first <td> and last </td>. We produce a complete table row by printing out <tr><td> before the joined data and </td></tr> after the joined data.
Both substr() and unpack() have equivalent capabilities when the fixed-width fields are strings, but unpack() is the better solution when the elements of the fields aren’t just strings.
If all of your fields are the same size, str_split() is a handy shortcut for chopping up incoming data. It returns an array made up of sections of a string.
Example Chopping up a string with str_split( )
$fields = str_split($line_of_data,32);
// $fields[0] is bytes 0 - 31
// $fields[1] is bytes 32 - 63
// and so on
No comments:
Post a Comment