If you can't read please download the document
Upload
danielrhodes
View
5.967
Download
5
Embed Size (px)
DESCRIPTION
Multibyte string handling in PHP with the mbstring extension
Citation preview
2. What is mbstring for?
3. Supports many character encodings including unicode 4. Supports some different national languages * 5. Character encoding conversion 6. Some Japanese specific functions / settings 7. Mbstring is NOT...
8. How to get mbstring
9. On most PHP servers it's already there so... 10. ...just switch it on! 11. Present and switched on out-of-the-box in Zend Server (CE and upwards) 12. If not present then download, but shouldn't need to compile etc 13. Some key directives for mbstring
14. mbstring.language 15. See http://php.net/manual/en/mbstring.configuration.php 16. Easy peasy in Zend Server 17. Enough now let's rock and roll!
18. For example, we all know strlen() 19. So let's have a look at mb_strlen() 20. mb_strlen() 21. More mb_strlen() 22. Even more mb_strlen() 23. Still rocking and rolling...
24. So let's have a look at mb_strpos() 25. mb_strpos() 26. More mb_strpos() 27. Wrapping up and moving on
28. BE CAREFUL but you can make calls to strlen() (and etc) automatically call mb_strlen()- this is the mbstring.func_overload directive 29. Mbstring specific functions
30. mb_convert_encoding() 31. LOTS of supported encodings 32. ( http://php.net/manual/en/mbstring.supported-encodings.php ) 33. Mbstring.detect_order directive comes into play here 34. mb_detect_encoding() 35. mb_detect_order() 36. More mb_detect_order() 37. Mbstring specific functions
38. mb_convert_encoding() 39. LOTS of supported encodings 40. ( http://php.net/manual/en/mbstring.supported-encodings.php ) 41. Mbstring.detect_order directive comes into play here 42. mb_convert_encoding() 43. More mb_convert_encoding() 44. Regular expressions on multibyte strings
45. mb_ereg() 46. mb_ereg_match() 47. mb_ereg_replace() 48. and many more! 49. Note: PHP's regular preg_*() functions can also do UTF-8 with the /u pattern modifier !! 50. mb_ereg() 51. More mb_ereg() 52. Summary of mbstring functions
53. Multibyte versions of regular string functions 54. Regex functions 55. Encoding detection / conversion 56. Japanese specific functions / settings 57. Other misc stuff 58. Putting it all together
59. BUT... 60. Don't forget your: 61. PHP script files(best to have encoding of file same asmbstring.internal_encoding) 62. Database 63. Output (ie. Probably HTML) 64. Input (ie. Form submissions etc) 65. Multibyting your database
66. PostgreSQL I'm no expert but IIRC Postgres automagically understands and converts input / output character encodings 67. MySQL can choose a collation for server, each schema, each table, each column! 68. MySQL collation means charset + sort order (for example CS means case-sensitive sort order) 69. More multibyting your database
70. You'll need to do an SQL query of: 71. SET NAMES utf8 and / or SET CHARACTER SET utf8 72. After connecting and before reading / writing 73. (otherwise characters will become garbled) 74. Multibyting your output HTML
75. Content-Type: "text/html; charset=UTF-8;" 76. ie. header("Content-Type: text/html; charset=UTF-8;"); 77. Possible but less desirable to output as a meta tag in the HTML : 78. 79. (or simply for HTML5) 80. Don't forget lang=xy or xml:lang=xy where needed 81. Multibyting your input
82. Out-of-the-box, form data on a SJIS host page comes in as SJIS. Form data on an EUC-JP host page comes in as EUC-JP and etc 83. Or have I just been very lucky? 84. Look at mbstring.http_input directive if struggling 85. That's all folks!
86. Previous examples of preg_match() failing will probably work with the /u patter modifier (to enable UTF-8) 87. No mb version of trim() or preg_match_all() 88. Mbstring in action:http://twitter.com/japxlate http://mapanese.info 89. Questions welcome at [email protected]