Foreign Characters create symbols in PHP and MySQL

Applies to:
  • PHP 5.4
  • MySQL 5.1.4
  • Databases initially setup with collation 'latin1_swedish_ci'

What
Another article trying to help people display foreign characters on their website without the funny question marks in diamond symbols and how I solved it in my case.

Why?
My company has started using international country and region names which include foreign characters. When we copy and paste their content into our website, our webpages display a question mark inside a diamond shape instead of the foreign character.

How does it happen? Have I tried the other solutions on the web? I have tried adding the following to my headers:
  • This does nothing:
    copyraw
    // add these to the header: DOESNT WORK
    mb_internal_encoding('UTF-8');
    mb_http_output('UTF-8');
    mb_http_input('UTF-8');
    1.  // add these to the header: DOESNT WORK 
    2.  mb_internal_encoding('UTF-8')
    3.  mb_http_output('UTF-8')
    4.  mb_http_input('UTF-8')
  • This does nothing:
    copyraw
    // add the following to your header
    <meta charset='utf-8'>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    1.  // add the following to your header 
    2.  <meta charset='utf-8'> 
    3.  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 
  • This does nothing:
    copyraw
    // set the collation of the database and any text fields to 'utf8-general-ci'
    ALTER DATABASE db_name CHARACTER SET latin1 COLLATE utf8-general-ci; 
    // set text fields to utf8
    SET NAMES 'utf8';
    1.  // set the collation of the database and any text fields to 'utf8-general-ci' 
    2.  ALTER DATABASE db_name CHARACTER SET latin1 COLLATE utf8-general-ci; 
    3.  // set text fields to utf8 
    4.  SET NAMES 'utf8'
  • This does nothing:
    copyraw
    // convert the foreign character using PHP to htmlentities
    $my_description = htmlentities($my_description);
    $my_description = htmlspecialchars($my_description);
    1.  // convert the foreign character using PHP to htmlentities 
    2.  $my_description = htmlentities($my_description)
    3.  $my_description = htmlspecialchars($my_description)

Did I miss anything?
So none of the above worked. If there is another solution out there, I didn't find one that worked otherwise I'd have included it in this article.

How?
The fix is a PHP one and has to do with versions of PHP and MySQL. As quoted by the PHP htmlspecialchars:
"As of PHP 5.4 they changed default encoding from "ISO-8859-1" to "UTF-8". So if you get null from htmlspecialchars or htmlentities..." Source: PHP Manual

My fix (in some cases):
copyraw
$my_description = html_entity_decode(htmlentities($my_description, ENT_COMPAT, 'ISO-8859-1', true), ENT_COMPAT, 'UTF-8');
  1.  $my_description = html_entity_decode(htmlentities($my_description, ENT_COMPAT, 'ISO-8859-1', true), ENT_COMPAT, 'UTF-8')
A more updated version:
copyraw
$my_description = html_entity_decode(htmlentities($my_description, ENT_COMPAT, 'ISO-8859-15', true), ENT_COMPAT, 'UTF-8');
  1.  $my_description = html_entity_decode(htmlentities($my_description, ENT_COMPAT, 'ISO-8859-15', true), ENT_COMPAT, 'UTF-8')
My other fix (for Croatian characters and Western languages):
copyraw
// insert after database connection and prior to database query (where $db_conn is your mysqli connection)
mysqli_set_charset($db_conn,"utf8");

// decode and display in HTML-safe
$my_description = html_entity_decode(htmlentities($my_description, ENT_COMPAT,'UTF-8', true), ENT_COMPAT, 'UTF-8')
  1.  // insert after database connection and prior to database query (where $db_conn is your mysqli connection) 
  2.  mysqli_set_charset($db_conn,"utf8")
  3.   
  4.  // decode and display in HTML-safe 
  5.  $my_description = html_entity_decode(htmlentities($my_description, ENT_COMPAT,'UTF-8', true), ENT_COMPAT, 'UTF-8') 

Other things to consider:
  • Save all PHP files as UTF-8 and not ANSI
  • Ensure all text fields in the database are set to a UTF-8 charset (eg. utf8_general_ci)
  • Ensure headers are setting the charset to UTF-8
  • Add the charset attribute to posting HTML forms: accept-charset="utf-8"

Conclusion:
The final fix in my case was due to me not specifying some extra parameters on the htmlentities or htmlspecialchars PHP functions. My database stored the foreign characters as they were so the fault was somewhere between PHP reading from MySQL to displaying the characters on the webpage.

Category: Personal Home Page :: Article: 637

Credit where Credit is Due:


Feel free to copy, redistribute and share this information. All that we ask is that you attribute credit and possibly even a link back to this website as it really helps in our search engine rankings.

Disclaimer: Please note that the information provided on this website is intended for informational purposes only and does not represent a warranty. The opinions expressed are those of the author only. We recommend testing any solutions in a development environment before implementing them in production. The articles are based on our good faith efforts and were current at the time of writing, reflecting our practical experience in a commercial setting.

Thank you for visiting and, as always, we hope this website was of some use to you!

Kind Regards,

Joel Lipman
www.joellipman.com

Related Articles

Joes Revolver Map

Accreditation

Badge - Certified Zoho Creator Associate
Badge - Certified Zoho Creator Associate

Donate & Support

If you like my content, and would like to support this sharing site, feel free to donate using a method below:

Paypal:
Donate to Joel Lipman via PayPal

Bitcoin:
Donate to Joel Lipman with Bitcoin bc1qf6elrdxc968h0k673l2djc9wrpazhqtxw8qqp4

Ethereum:
Donate to Joel Lipman with Ethereum 0xb038962F3809b425D661EF5D22294Cf45E02FebF
© 2024 Joel Lipman .com. All Rights Reserved.