PHP Internationalization and Localization
Introduction
Though everyone who programs in PHP has to learn some English eventually to get a handle on its function names and language constructs, PHP can create applications that speak just about any language. Some applications need to be used by speakers of many different languages. Taking an application written for French speakers and making it useful for German speakers is made easier by PHP’s support for internationalization and localization.
The recipies in this chapter rely on the capabilities of PHP’s intl extension for internationalization and localization tasks. Underlying this extension is the powerful ICU library. ICU is widely used and has both C/C++ and Java implementations. This means that the concepts around working with in PHP translate well if you are doing globalization work in other (programming) languages.
The intl extension is bundled with PHP versions 5.3.0 and later. To use it with PHP 5.2.0 or later, install it from PECL.
Internationalization (often abbreviated I18N) is the process of taking an application designed for just one locale and restructuring it so that it can be used in many different locales.1 Localization (often abbreviated L10N) is the process of adding support for a new locale to an internationalized application.2
A locale is a group of settings that describe text formatting and language customs in a particular area of the world. Locales describe behavior for:
Collation
How text is sorted: which letters go before and after others in alphabetical order.
Numbers
How numeric information (including currency amounts) is displayed, including
how to group digits, what characters to use as the thousands separator and decimal
point, and how to indicate negative amounts.
Times and Dates
How time and date information is formatted and displayed, such as names of
months and days and whether to use 24- or 12-hour time.
Messages
Text messages used by applications that need to display information in multiple
languages.
A locale ID has a few components, each separated by underscores. The first is the language code, an abbreviation that indicates a language. This is, for example, “en” for English or “pt” for Portuguese. The language codes are the two-letter codes specified in the ISO 639-1 standard.
Next comes an optional script code, which indicates what set of characters should be used to represent text in this locale. For example, Arab indicates Arabic and Cyrl indicates Cyrillic. These script codes are enumerated as part of ISO 15924.
After that comes an optional country code, to distinguish between different countries that speak different versions of the same language. For example, “en_US” for US English and “en_UK” for British English, or “pt_BR” for Brazilian Portuguese and “pt_PT” for Portuguese Portuguese. The country codes are the two-letter codes specified in the ISO 3166 standard.
To further allow for specifying differences among the same language and country, the next component of a locale ID can be an optional variant code. These variant codes, documented in the IANA language subtag registry, indicate variations such as using the Biscayan dialect of Basque (variant biscayan), or that the Høgnorsk orthography of Norwegian should be used (variant hognorsk). Your basic day-to-day use of locales will probably not involve variants.
After the exotic variant can be an optional list of keywords, prefixed by a @. These keywords are semicolon-separated name=value pairs that offer a further way to provide customized information about the locale. For example, the locale fr_CA@currency=USD indicates a French-language locale in Canada, but using US dollars for currency. Useful for merchants on the Quebec-Vermont border, perhaps.
To help you deal with locales, Demonstrates how to set the locale as asked for by a user’s web browser.
Different techniques are necessary for correct localization of plain text, numbers, dates and times, and currency. Localization can also be applied to external entities your program uses, such as images and included files. Localizing these kinds of content.
Locale-aware sorting and dealing with large amounts of localization data.
Discuss how to make sure your programs work well with a variety of character encodings so they can handle strings such as à l’Opéra-Théâtre, поленика, and 優之良品. One way to do this is to have all text your programs process be encoded as UTF-8. This encoding scheme can handle the Western characters in the familiar ISO-8859-1 encoding as well as characters for other writing systems around the world. These recipes focus on using UTF-8 to provide a seamless, language-independent experience for your users.
No comments:
Post a Comment