Tech-Tidbit: HTML Entity to Decimal Translation

It became necessary this week for me to find a way to convert html entities into their decimal equivalents. I’m not very fond of dealing with character encoding and entity issues but I was finally able to get things working. In the end we modified data within the database which meant I wouldn’t get to use this script, but it was a very nice find.

The nice thing about this is that it leaves any existing decimal-format characters alone, and properly translates html entities to the decimal format.

After searching the net for a while I found a piece of code that seemed like a good solution. The original needed a bit of cleanup and tweaking, but here it is:

function htmlentities2unicodeentities ($input) {
  $htmlEntities = array_values (get_html_translation_table (HTML_ENTITIES, ENT_QUOTES));
  $entitiesDecoded = array_keys  (get_html_translation_table (HTML_ENTITIES, ENT_QUOTES));
  $num = count ($entitiesDecoded);
  for ($u = 0; $u < $num; $u++) {
   $utf8Entities[$u] = '&#'.ord($entitiesDecoded[$u]).';';
  }
  return str_replace ($htmlEntities, $utf8Entities, $input);
}

You can test the translation with this:

$test_strs = array('K&#363;lob', 'K&oacute;pavogur');

foreach($test_strs as $test_str){

  print 'ORIGINAL: ' . $test_str . '<br />'."\n";
  print 'TRANSLATED: ' . entities_to_unicode($test_str) . '<p>'."\n";

}

If you’re interested in converting things the other way around, good luck. I’ve been searching around for a while and I just can’t seem to find anything that works well enough without using some long array to translate things. Let me know if you find one!

Possibly related posts:

  1. Tech-Tidbit: Smarty Variables

No Comments »

No comments yet.

RSS feed for comments on this post.

Leave a comment