Posted By

etangle on 01/26/10


Tagged

hex utf8 ucs2


Versions (?)

UCS2/HexEncoded characters to UTF8 in php


 / Published in: PHP
 

URL: http://stackoverflow.com/questions/2005358/ucs2-hexencoded-characters-to-utf8-in-php

You can recompose a Hex-representation by converting the hexadecimal chars with hexdec(), repacking the component chars, and then using mbconvertencoding() to convert from UCS-2 into UTF-8. As I mentioned in my answer to your other question, you'll still need to be careful with the output encoding, although here you've specifically requested UTF-8, so we'll use that for the upcoming sample.

Here's a sample that does the work of converting UCS-2 in Hex to UTF-8 in native string form. As PHP currently doesn't ship with a hex2bin() function, which would make things very easy, we'll use the one posted at the reference link at the end. I've renamed it to local_hex2bin() just in case it conflicts with a future version of PHP or with a definition in some other 3rd party code that you include in your project.

Locally, I called this sample page UCS2HexToUTF8.php, and then used a querystring to set the output.

UCS2HexToUTF8.php?06450631062d0628064b06270020063906270644064500200021

  1. <?php
  2. function local_hex2bin($h)
  3. {
  4. if (!is_string($h)) return null;
  5. $r='';
  6. for ($a=0; $a<strlen($h); $a+=2) { $r.=chr(hexdec($h{$a}.$h{($a+1)})); }
  7. return $r;
  8. };
  9.  
  10. header('Content-Type: text/html; charset=UTF-8');
  11. mb_http_output('UTF-8');
  12. echo '<html><head>';
  13. echo '<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />';
  14. echo '</head><body>';
  15. echo 'output encoding: '.mb_http_output().'<br />';
  16. $querystring = $_SERVER['QUERY_STRING'];
  17. // NOTE: we could substitute one of the following:
  18. // $querystring = '06450631062d0628064b06270020063906270644064500200021';
  19. // $querystring = '00480065006C006C006F';
  20. $ucs2string = local_hex2bin($querystring);
  21. // NOTE: The source encoding could also be UTF-16 here.
  22. // TODO: Should check byte-order-mark, if available, in case
  23. // 16-bit-aligned bytes are reversed.
  24. $utf8string = mb_convert_encoding($ucs2string, 'UTF-8', 'UCS-2');
  25. echo 'query string: '.$querystring.'<br />';
  26. echo 'converted string: '.$utf8string.'<br />';
  27. echo '</body>';
  28. ?>

Report this snippet  

You need to login to post a comment.