Skip to content Skip to sidebar Skip to footer

Fixing Unclosed Html Tags

I am working on some blog layout and I need to create an abstract of each post (say 15 of the lastest) to show on the homepage. Now the content I use is already formatted in html t

Solution 1:

There are lots of methods that can be used:

  1. Use a proper HTML parser, like DOMDocument
  2. Use PHP Tidy to repair the un-closed tag
  3. Some would suggest HTML Purifier

Solution 2:

As ajreal said, DOMDocument is a solution.

Example :

$str = "
<html><head><title>test</title></head><body><p>error</i></body></html>
";

$doc = new DOMDocument();
@$doc->loadHTML($str);
echo $doc->saveHTML();

Advantage : natively included in PHP, contrary to PHP Tidy.

Solution 3:

You can use DOMDocument to do it, but be careful of string encoding issues. Also, you'll have to use a complete HTML document, then extract the components you want. Here's an example:

functionmake_excerpt ($rawHtml, $length = 500) {
  // append an ellipsis and "More" link$content = substr($rawHtml, 0, $length)
    . '&hellip; <a href="/link-to-somewhere">More &gt;</a>';

  // Detect the string encoding$encoding = mb_detect_encoding($content);

  // pass it to the DOMDocument constructor$doc = new DOMDocument('', $encoding);

  // Must include the content-type/charset meta tag with $encoding// Bad HTML will trigger warnings, suppress those
  @$doc->loadHTML('<html><head>'
    . '<meta http-equiv="content-type" content="text/html; charset='
    . $encoding . '"></head><body>' . trim($content) . '</body></html>');

  // extract the components we want$nodes = $doc->getElementsByTagName('body')->item(0)->childNodes;
  $html = '';
  $len = $nodes->length;
  for ($i = 0; $i < $len; $i++) {
    $html .= $doc->saveHTML($nodes->item($i));
  }
  return$html;
}

$html = "<p>.......................</p>
  <p>...........
    <p>............</p>
    <p>...........| 500 chars";

// output fixed htmlecho make_excerpt($html, 500);

Outputs:

<p>.......................</p><p>...........
    </p><p>............</p><p>...........| 500 chars… <ahref="/link-to-somewhere">More &gt;</a></p>

If you are using WordPress you should wrap the substr() invocation in a call to wpautop - wpautop(substr(...)). You may also wish to test the length of the $rawHtml passed to the function, and skip appending the "More" link if it isn't long enough.

Post a Comment for "Fixing Unclosed Html Tags"