Welcome to Joel Lipman .Com

Preparing Content...


Loading...

Our Website Development Notes

We hope this helps!

Articles // Web Development // PHP

PHP Issue: simplexml_load_string parser error : Input is not proper UTF-8, indicate encoding !

Tuesday, 2nd August 2016
3,781 Reads
What?
A quick article to stop me running into this issue again. This article serves to address the issue of importing characters from an XML in a different language character set and trying to load it in PHP with the function simplexml_load_string(). The error I get is something similar to:
PHP Warning:
simplexml_load_string(): Entity: line #: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA0 0x3C 0x2F 0x73 in /home/public_html/my_folder/my_xml_processing_script.php on line 160

Why?
I'm downloading an XML feed to our servers, and then loading the downloaded file into memory with simplexml_load_string(). I get the above error when it is attempting to load an XML feed which is mostly in Spanish and breaks at the following XML node:
  1. <baños>2</baños>  
  2.  
  3. -> yields issue: PHP Warning: simplexml_load_string(): <baños>2</baños> in /home/public_html/my_folder/my_xml_processing_script.php on line 160  
  4.  
  5. should read  
  6.  
  7. <baños>2</baños>

How?
A two-step process, my issue was with how the file was downloaded with cURL. The XML node should be baños.

The initial command using cURL was:
  1. function get_data($url) {  
  2. $ch = curl_init();  
  3. $timeout = 5;  
  4. curl_setopt($ch, CURLOPT_URL, $url);  
  5. curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);  
  6. curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);  
  7. $data = curl_exec($ch);  
  8. curl_close($ch);  
  9. return $data;  
  10. }  
  11. $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );  
  12. $file_xml = simplexml_load_string( $file_content ); // doesn't work and returns a load of parser errors

The tweaked command using cURL is:
  1. function get_data($url) {  
  2. $ch = curl_init();  
  3. $timeout = 5;  
  4. curl_setopt($ch, CURLOPT_URL, $url);  
  5. curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);  
  6. curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);  
  7. $data = utf8_decode(curl_exec($ch)); // note the utf8_decode function applied here  
  8. curl_close($ch);  
  9. return $data;  
  10. }  
  11. $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );  
  12. $file_xml = simplexml_load_string( utf8_encode( $file_content ) ); // works! DONE! Stop reading any further and tell your boss it was always in hand.


Other things I tried but to no avail
The solution above was as easy as that. Here are a number of other things I tried first:
  • mysql_set_charset(): No
  • iconv(): No
  • htmlentities(): No
  • preg_replace_callback(): No
  • sxe(): No
  • $xml = simplexml_load_string( utf8_encode($rss) );: No. Oh wait, yes! sorta, don't forget the decode when downloading the XML.



Recent Comments

Gravatar for Art
Oracle: order by subquery missing right parenthesis
Hello, what if I would like to add rownum to that code, but in my case it's all is subquery? How can I do this ?

20 Jan


Gravatar for Robert
CharIndex Reverse - find occurrence starting from end of string in TSQL
Very helpful thanks! This worked for me as well, and is a little shorter. Not sure if there are drawbacks. SELECT RIGHT(@Haystack,CHARINDEX(@Delimiter,REVERSE(@Haystack))-LEN(@Delimiter))

29 Dec


Gravatar for Translation

27 Dec


Gravatar for Tibbe
JComments 2.3.0 with ReCaptcha in Joomla 2.5.x
Hi there, This looks like a great solution to get rid of spam comments. How to integrate this in Joomla 3.x? Step 2 I did in settings.xml, but no result in frontend. Kind Regards, Tibbe

16 Dec


Gravatar for sach|n
SSRS Repeat Headers in PDF Report
Yes its working for me..thnx Joel.

7 Oct