PDA

View Full Version : Is it possible to convert all the comments to utf-8?


jdleung
05-10-2006, 08:05 AM
when I set the browser's encoding to gb2312, input comment in Chinese. and then set the browser's encoding back to utf-8, the comment will be displayed in a mess.

it means when someone input somthing in his mother-language(except English), and using non-utf8 encoding, then other vistors who using utf-8 will see the mess.

is it possible to convert all the comments to utf-8? no matter what encoding the vistor use, and no matter what language the vistor use.

Connie
05-10-2006, 01:23 PM
we encode every input which we get to UTF-8

but we must rely on the browser, what they "deliver"
if the webpages, where data is entered into forms, have the correct characterset-information in the header, (UTF-8), every well-educated browser will use this characterset

the problem arrives when "autodetect" is not set to UTF-8 etc.

I wonder how can people enter chinese characters when their browser is set to read the page as italian? everybody should notice that he is using the wrong characterset as the text on the page will appear strange to him...

but this is a never ending problem. UTF-8 must be better known and understood (especially in Internet Explorer default settings etc. )

I know many websites with a lot of comments which are in arabic and german or french text is presented well because the configuration is set to UTF-8 as well

you could outline in your form-template to the user, that the browser should be set to UTF-8 or autodetect to guarantee correct results..

we cannot convert the input to UTF-8 because we do not know which character the user wants to enter.
We save everything in UTF-8 and we deliver everything in UTF-8
but how to "convert" ? the keyboard input is taken and if the user has set it to greek, it arrives in greek, but we don't know that.

Some people suggest to try to identify browser information about current character code, but who will do this job? I would not rely on any DOM input sent by an MS browser....

Joe[y]
05-10-2006, 01:42 PM
pixelpost does as much as it can but it is impossible to support every type of encoding - unlike many programs pixelpost at least succesfully uses one standard of character encoding but if a user chooses to override that with their browser then there is little we can do... there is no way to force a character encoding on a user - we can set the browser recommendation (for auto-detect) in a meta tag (which we do and recommend in all templates) but if a user chooses to force iso8859 etc on the page then the process is screwed from the beginning. there are massive glitches in internationalisation of the web.

jdleung
05-10-2006, 07:15 PM
Thanks Connie and Joe[y]

I know more now on utf-8.

one more question. is possible to write an addon to convert it? the addon lists many languages that may force to utf-8. and the website owner decides which language should be converted.

something like below:
if (chinese==1){convert it}
if (italy==1){convert it}
if (germany==1){convert it}

just my thinking, :p