Print

PHP Issue: simplexml_load_string parser error : Input is not proper UTF-8, indicate encoding !

What?
A quick article to stop me running into this issue again. This article serves to address the issue of importing characters from an XML in a different language character set and trying to load it in PHP with the function simplexml_load_string(). The error I get is something similar to:

PHP Warning:
simplexml_load_string(): Entity: line #: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA0 0x3C 0x2F 0x73 in /home/public_html/my_folder/my_xml_processing_script.php on line 160


Why?
I'm downloading an XML feed to our servers, and then loading the downloaded file into memory with simplexml_load_string(). I get the above error when it is attempting to load an XML feed which is mostly in Spanish and breaks at the following XML node:

copyraw
<baños>2</baños>

-> yields issue: PHP Warning:  simplexml_load_string():     <baños>2</baños> in /home/public_html/my_folder/my_xml_processing_script.php on line 160

should read

<baños>2</baños>
  1.  <baños>2</baños> 
  2.   
  3.  -> yields issue: PHP Warning:  simplexml_load_string():     <baños>2</baños> in /home/public_html/my_folder/my_xml_processing_script.php on line 160 
  4.   
  5.  should read 
  6.   
  7.  <baños>2</baños> 


How?

A two-step process, my issue was with how the file was downloaded with cURL. The XML node should be baños.

The initial command using cURL was:

copyraw
function get_data($url) {
        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
}
$file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
$file_xml = simplexml_load_string( $file_content );  // doesn't work and returns a load of parser errors
  1.  function get_data($url) { 
  2.          $ch = curl_init()
  3.          $timeout = 5
  4.          curl_setopt($ch, CURLOPT_URL, $url)
  5.          curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1)
  6.          curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout)
  7.          $data = curl_exec($ch)
  8.          curl_close($ch)
  9.          return $data
  10.  } 
  11.  $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" )
  12.  $file_xml = simplexml_load_string( $file_content );  // doesn't work and returns a load of parser errors 


The tweaked command using cURL is:

copyraw
function get_data($url) {
        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $data = utf8_decode(curl_exec($ch));  // note the utf8_decode function applied here
        curl_close($ch);
        return $data;
}
$file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
$file_xml = simplexml_load_string( utf8_encode( $file_content ) );  // works!  DONE! Stop reading any further and tell your boss it was always in hand.
  1.  function get_data($url) { 
  2.          $ch = curl_init()
  3.          $timeout = 5
  4.          curl_setopt($ch, CURLOPT_URL, $url)
  5.          curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1)
  6.          curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout)
  7.          $data = utf8_decode(curl_exec($ch));  // note the utf8_decode function applied here 
  8.          curl_close($ch)
  9.          return $data
  10.  } 
  11.  $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" )
  12.  $file_xml = simplexml_load_string( utf8_encode( $file_content ) );  // works!  DONE! Stop reading any further and tell your boss it was always in hand. 



Other things I tried but to no avail
The solution above was as easy as that. Here are a number of other things I tried first:

Category: Personal Home Page :: Article: 642