Java Converting Between Unicode Characters and Strings
Problem
You want to convert between Unicode characters and Strings.
Solution
Since both Java char values and Unicode characters are 16 bits in width, a char can
hold any Unicode character. The charAt( ) method of String returns a Unicode character.
The StringBuilder append( ) method has a form that accepts a char. Since char
is an integer type, you can even do arithmetic on chars, though this is not necessary
as frequently as in, say, C. Nor is it often recommended, since the Character class
provides the methods for which these operations were normally used in languages
such as C. Here is a program that uses arithmetic on chars to control a loop, and also
appends the characters into a StringBuilder (see Recipe 3.3):
/** * Conversion between Unicode characters and Strings */ public class UnicodeChars { public static void main(String[] argv) { StringBuffer b = new StringBuffer( ); for (char c = 'a'; c<'d'; c++) { b.append(c); } b.append('\u00a5'); // Japanese Yen symbol b.append('\u01FC'); // Roman AE with acute accent b.append('\u0391'); // GREEK Capital Alpha b.append('\u03A9'); // GREEK Capital Omega for (int i=0; i
When you run it, the expected results are printed for the ASCII characters. On my
Unix system, the default fonts don’t include all the additional characters, so they are
either omitted or mapped to irregular characters (Recipe 13.3 shows how to draw
text in other fonts):
C:\javasrc\strings>java UnicodeChars Character #0 is a Character #1 is b Character #2 is c Character #3 is % Character #4 is | Character #5 is Character #6 is ) Accumulated characters are abc%|)
My Windows system doesn’t have most of those characters either, but at least it prints the ones it knows are lacking as question marks (Windows system fonts are more homogenous than those of the various Unix systems, so it is easier to know what won’t work). On the other hand, it tries to print the Yen sign as a Spanish capital Enye (N with a ~ over it). Amusingly, if I capture the console log under Windows into a file and display it under Unix, the Yen symbol now appears:
Character #0 is a Character #1 is b Character #2 is c Character #3 is ¥ Character #4 is ? Character #5 is ? Character #6 is ? Accumulated characters are abc¥???
No comments:
Post a Comment