PHP Issue: simplexml_load_string parser error : Input is not proper UTF-8, indicate encoding !

What?
A quick article to stop me running into this issue again. This article serves to address the issue of importing characters from an XML in a different language character set and trying to load it in PHP with the function simplexml_load_string(). The error I get is something similar to:

PHP Warning:
simplexml_load_string(): Entity: line #: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA0 0x3C 0x2F 0x73 in /home/public_html/my_folder/my_xml_processing_script.php on line 160


Why?
I'm downloading an XML feed to our servers, and then loading the downloaded file into memory with simplexml_load_string(). I get the above error when it is attempting to load an XML feed which is mostly in Spanish and breaks at the following XML node:

copyraw
<baños>2</baños>

-> yields issue: PHP Warning:  simplexml_load_string():     <baños>2</baños> in /home/public_html/my_folder/my_xml_processing_script.php on line 160

should read

<baños>2</baños>
  1.  <baños>2</baños> 
  2.   
  3.  -> yields issue: PHP Warning:  simplexml_load_string():     <baños>2</baños> in /home/public_html/my_folder/my_xml_processing_script.php on line 160 
  4.   
  5.  should read 
  6.   
  7.  <baños>2</baños> 


How?

A two-step process, my issue was with how the file was downloaded with cURL. The XML node should be baños.

The initial command using cURL was:

copyraw
function get_data($url) {
        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
}
$file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
$file_xml = simplexml_load_string( $file_content );  // doesn't work and returns a load of parser errors
  1.  function get_data($url) { 
  2.          $ch = curl_init()
  3.          $timeout = 5
  4.          curl_setopt($ch, CURLOPT_URL, $url)
  5.          curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1)
  6.          curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout)
  7.          $data = curl_exec($ch)
  8.          curl_close($ch)
  9.          return $data
  10.  } 
  11.  $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" )
  12.  $file_xml = simplexml_load_string( $file_content );  // doesn't work and returns a load of parser errors 


The tweaked command using cURL is:

copyraw
function get_data($url) {
        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $data = utf8_decode(curl_exec($ch));  // note the utf8_decode function applied here
        curl_close($ch);
        return $data;
}
$file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
$file_xml = simplexml_load_string( utf8_encode( $file_content ) );  // works!  DONE! Stop reading any further and tell your boss it was always in hand.
  1.  function get_data($url) { 
  2.          $ch = curl_init()
  3.          $timeout = 5
  4.          curl_setopt($ch, CURLOPT_URL, $url)
  5.          curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1)
  6.          curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout)
  7.          $data = utf8_decode(curl_exec($ch));  // note the utf8_decode function applied here 
  8.          curl_close($ch)
  9.          return $data
  10.  } 
  11.  $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" )
  12.  $file_xml = simplexml_load_string( utf8_encode( $file_content ) );  // works!  DONE! Stop reading any further and tell your boss it was always in hand. 



Other things I tried but to no avail
The solution above was as easy as that. Here are a number of other things I tried first:

  • mysql_set_charset(): No
  • iconv(): No
  • htmlentities(): No
  • preg_replace_callback(): No
  • sxe(): No
  • $xml = simplexml_load_string( utf8_encode($rss) );: No. Oh wait, yes! sorta, don't forget the decode when downloading the XML.
Category: Personal Home Page :: Article: 642

Credit where Credit is Due:


Feel free to copy, redistribute and share this information. All that we ask is that you attribute credit and possibly even a link back to this website as it really helps in our search engine rankings.

Disclaimer: Please note that the information provided on this website is intended for informational purposes only and does not represent a warranty. The opinions expressed are those of the author only. We recommend testing any solutions in a development environment before implementing them in production. The articles are based on our good faith efforts and were current at the time of writing, reflecting our practical experience in a commercial setting.

Thank you for visiting and, as always, we hope this website was of some use to you!

Kind Regards,

Joel Lipman
www.joellipman.com

Related Articles

Joes Revolver Map

Joes Word Cloud

Accreditation

Badge - Certified Zoho Creator Associate
Badge - Certified Zoho Creator Associate

Donate & Support

If you like my content, and would like to support this sharing site, feel free to donate using a method below:

Paypal:
Donate to Joel Lipman via PayPal

Bitcoin:
Donate to Joel Lipman with Bitcoin bc1qf6elrdxc968h0k673l2djc9wrpazhqtxw8qqp4

Ethereum:
Donate to Joel Lipman with Ethereum 0xb038962F3809b425D661EF5D22294Cf45E02FebF
© 2024 Joel Lipman .com. All Rights Reserved.