PHP Strings Parsing Fixed-Width Field Data Records - Supercoders | Web Development and Design | Tutorial for Java, PHP, HTML, Javascript PHP Strings Parsing Fixed-Width Field Data Records - Supercoders | Web Development and Design | Tutorial for Java, PHP, HTML, Javascript

Breaking

Post Top Ad

Post Top Ad

Tuesday, May 7, 2019

PHP Strings Parsing Fixed-Width Field Data Records

PHP Strings




Parsing Fixed-Width Field Data Records


Problem

You need to break apart fixed-width records in strings.


Solution

Example   Parsing fixed-width records with substr( )'

         $fp = fopen('fixed-width-records.txt','r',true) or die ("can't open file");
         while ($s = fgets($fp,1024)) {
              $fields[1] = substr($s,0,25); // first field: first 25 characters of the line
              $fields[2] = substr($s,25,15); // second field: next 15 characters of the line
              $fields[3] = substr($s,40,4); // third field: next 4 characters of the line
              $fields = array_map('rtrim', $fields); // strip the trailing whitespace
              // a function to do something with the fields
              process_fields($fields);
         }
         fclose($fp) or die("can't close file");

Example   Parsing fixed-width records with unpack( )

         function fixed_width_unpack($format_string,$data) {
              $r = array();
              for ($i = 0, $j = count($data); $i < $j; $i++) {
                $r[$i] = unpack($format_string,$data[$i]);
              }
              return $r;
         }

Discussion

Data in which each field is allotted a fixed number of characters per line may look like
this list of books, titles, and publication dates:

         $booklist=<<<END
         Elmer Gantry                  Sinclair Lewis 1927
         The Scarlatti InheritanceRobert Ludlum 1971
         The Parsifal Mosaic       Robert Ludlum 1982
         Sophie's Choice             William Styron 1979
         END;

In each line, the title occupies the first 25 characters, the author’s name the next 15 characters, and the publication year the next 4 characters. Knowing those field widths, you can easily use substr() to parse the fields into an array:

         $books = explode("\n",$booklist);

         for($i = 0, $j = count($books); $i < $j; $i++) {
             $book_array[$i]['title'] = substr($books[$i],0,25);
             $book_array[$i]['author'] = substr($books[$i],25,15);
             $book_array[$i]['publication_year'] = substr($books[$i],40,4);
         }

Exploding $booklist into an array of lines makes the looping code the same whether it’s operating over a string or a series of lines read in from a file.

Example   fixed_width_substr( )

         function fixed_width_substr($fields,$data) {
             $r = array();
             for ($i = 0, $j = count($data); $i < $j; $i++) {
                 $line_pos = 0;
                 foreach($fields as $field_name => $field_length) {
                      $r[$i][$field_name] = rtrim(substr($data[$i],$line_pos,$field_length));
                      $line_pos += $field_length;
                 }
             }
             return $r;
         }
$book_fields = array('title' => 25,

                                     'author' => 15,
                                     'publication_year' => 4);

         $book_array = fixed_width_substr($book_fields,$booklist);

The variable $line_pos keeps track of the start of each field and is advanced by the previous field’s width as the code moves through each line. Use rtrim() to remove trailing whitespace from each field.

Example  fixed_width_unpack( )

         function fixed_width_unpack($format_string,$data) {
              $r = array();
              for ($i = 0, $j = count($data); $i < $j; $i++) {
                  $r[$i] = unpack($format_string,$data[$i]);
              }
              return $r;
         }

Because the A format to unpack() means space-padded string, there’s no need to rtrim() off the trailing spaces. Once the fields have been parsed into $book_array by either function, the data can be printed as an HTML table, for example:

         $book_array = fixed_width_unpack('A25title/A15author/A4publication_year', $books);
         print "<table>\n";
         // print a header row
         print '<tr><td>';
         print join('</td><td>',array_keys($book_array[0]));
         print "</td></tr>\n";
         // print each data row
         foreach ($book_array as $row) {
                 print '<tr><td>';
                 print join('</td><td>',array_values($row));
                 print "</td></tr>\n";
         }
         print "</table>\n";

Joining data on </td><td> produces a table row that is missing its first <td> and last </td>. We produce a complete table row by printing out <tr><td> before the joined data and </td></tr> after the joined data.

Both substr() and unpack() have equivalent capabilities when the fixed-width fields are strings, but unpack() is the better solution when the elements of the fields aren’t just strings.

If all of your fields are the same size, str_split() is a handy shortcut for chopping up incoming data. It returns an array made up of sections of a string. 

Example  Chopping up a string with str_split( )

          $fields = str_split($line_of_data,32);
          // $fields[0] is bytes 0 - 31
          // $fields[1] is bytes 32 - 63
          // and so on


No comments:

Post a Comment

Post Top Ad