PHP Web Automation
Converting HTML to Plain Text
Problem
You need to convert HTML to readable, formatted plain text.
Solution
Example Converting HTML to plain text
require_once 'class.html2text.inc';
/* Give file_get_contents() the path or URL of the HTML you want to process */
$html = file_get_contents(__DIR__ . '/article.html');
$converter = new html2text($html);
$plain_text = $converter->get_text();
Discussion
The html2text class has a large number of formatting rules built in so your generated plain text has some visual layout for headings, paragraphs, and so on. It also includes a list of all the links in the HTML at the bottom of the text it generates.
Note : The html2text class version 1.0 uses the /e modifier with preg_replace() in a few places. This is deprecated in PHP 5.5 and so will generate some deprecation warnings if your error level is configured to include them. To remove those warnings, change the patterns that end on /ie to end in just /i in lines 153, 156, 157, 164, and 170.
No comments:
Post a Comment