What?
A quick article to stop me running into this issue again. This article serves to address the issue of importing characters from an XML in a different language character set and trying to load it in PHP with the function simplexml_load_string(). The error I get is something similar to:
PHP Warning:
simplexml_load_string(): Entity: line #: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA0 0x3C 0x2F 0x73 in /home/public_html/my_folder/my_xml_processing_script.php on line 160
Why?
I'm downloading an XML feed to our servers, and then loading the downloaded file into memory with simplexml_load_string(). I get the above error when it is attempting to load an XML feed which is mostly in Spanish and breaks at the following XML node:
<baños>2</baños> -> yields issue: PHP Warning: simplexml_load_string(): <baños>2</baños> in /home/public_html/my_folder/my_xml_processing_script.php on line 160 should read <baños>2</baños>
- <baños>2</baños>
- -> yields issue: PHP Warning: simplexml_load_string(): <baños>2</baños> in /home/public_html/my_folder/my_xml_processing_script.php on line 160
- should read
- <baños>2</baños>
How?
A two-step process, my issue was with how the file was downloaded with cURL. The XML node should be baños.
The initial command using cURL was:
function get_data($url) { $ch = curl_init(); $timeout = 5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $data = curl_exec($ch); curl_close($ch); return $data; } $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" ); $file_xml = simplexml_load_string( $file_content ); // doesn't work and returns a load of parser errors
- function get_data($url) {
- $ch = curl_init();
- $timeout = 5;
- curl_setopt($ch, CURLOPT_URL, $url);
- curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
- curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
- $data = curl_exec($ch);
- curl_close($ch);
- return $data;
- }
- $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
- $file_xml = simplexml_load_string( $file_content );  // doesn't work and returns a load of parser errors
The tweaked command using cURL is:
function get_data($url) { $ch = curl_init(); $timeout = 5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $data = utf8_decode(curl_exec($ch)); // note the utf8_decode function applied here curl_close($ch); return $data; } $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" ); $file_xml = simplexml_load_string( utf8_encode( $file_content ) ); // works! DONE! Stop reading any further and tell your boss it was always in hand.
- function get_data($url) {
- $ch = curl_init();
- $timeout = 5;
- curl_setopt($ch, CURLOPT_URL, $url);
- curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
- curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
- $data = utf8_decode(curl_exec($ch));  // note the utf8_decode function applied here
- curl_close($ch);
- return $data;
- }
- $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
- $file_xml = simplexml_load_string( utf8_encode( $file_content ) );  // works! DONE! Stop reading any further and tell your boss it was always in hand.
Other things I tried but to no avail
The solution above was as easy as that. Here are a number of other things I tried first:
- mysql_set_charset(): No
- iconv(): No
- htmlentities(): No
- preg_replace_callback(): No
- sxe(): No
- $xml = simplexml_load_string( utf8_encode($rss) );: No. Oh wait, yes! sorta, don't forget the decode when downloading the XML.