/

View this issue as simple text / January 10, 2003
Top of Form

In this Issue

/ /
Bottom of Form
Welcome to the Core JavaTM Technologies Tech Tips, January 10, 2003. Here you'll get tips on using core Java technologies and APIs, such as those in Java 2 Platform, Standard Edition (J2SETM).
This issue covers:
Using Charsets and Encodings
Using Reflection To Create Class Instances
These tips were developed using Java 2 SDK, Standard Edition, v 1.4.
This issue of the Core Java Technologies Tech Tips is written by Glen McCluskey.


USING CHARSETS AND ENCODINGS

Suppose that you're doing some Java programming, and have need to write characters to a file:
import java.io.*;
public class Encode1 {
public static void main(String args[])
throws IOException {
Writer writer = new FileWriter("out");
writer.write("testing");
writer.close();
}
}
When you run this program in the United States in the SolarisTM Operating Environment or on the Windows platform, the result is a text file "out" of 7 bytes. This is what you would expect.
But there is an important issue here. Java characters are 16-bit, that is, each character is two bytes long. The Encode1 program writes a 7-character string to a file, and the result is a 7-byte file. You might ask: what happened to the other bytes, shouldn't there be 14 bytes written?
This issue falls under the title "character encodings". The problem is how to map between 16-bit characters representing Java data, and 8-bit bytes stored in data files. And in fact, it's trickier than simply "widening" or "narrowing" the character between 8 and 16 bits because there are literally hundreds of different character encoding schemes in use around the world. This means that the specific sequence of 8-bit bytes needed to represent a particular Java string changes from platform to platform and from locale to locale.
The Java system solves this problem by allowing you to choose the particular encoding scheme that's required when writing out characters. It also provides a reasonable default encoding based on your platform and locale. The Java system supports default encodings for performing I/O, as in the example above. In addition, you can also specify other named encodings ("charsets"). These encodings are described by string names, such as "UTF-8", and by instances of the java.nio.charset.Charset class. Charset is abstract, so the actual instances are objects of subclasses of Charset.
In the Encode1 example, one way of solving the encoding problem is to always write two bytes out for each character. However the file will have null bytes interspersed. Another approach is to throw away the high byte of each Java character. This will work in the example above, but it wouldn't work if you tried to write a string of Greek or Japanese instead..
What actually happens in this example is that the second approach is used -- the high byte is discarded. If you change the output line in the Encode1 program from:
writer.write("testing");
to:
writer.write("testing\u1234");
the total output length will be 8 bytes instead of 7, even though the Unicode character \u1234 cannot be represented using a single byte.
""Discard" in the previous discussion can have a couple of meanings. If the high byte of a Java character is 0, as is the case for characters representing 7-bit ASCII, then discard means to omit the high byte. However, another meaning applies to the situation where you have a Java character that is not mappable using a particular encoding. In such a case the character (two bytes) may be replaced by a default substitution byte. In the case above, \u1234 is replaced with 0x3f.
Let's now look at how to use charsets, mappings between characters and bytes. One basic question you might have is: what charsets are available? Here's a program that displays a list:
import java.nio.charset.*;
import java.util.*;
public class Encode2 {
public static void main(String args[]) {
Map availcs = Charset.availableCharsets();
Set keys = availcs.keySet();
for (Iterator iter =
keys.iterator();iter.hasNext();) {
System.out.println(iter.next());
}
}
}
The output should look something like this (but without the "*" character):
ISO-8859-1*
ISO-8859-15
US-ASCII*
UTF-16*
UTF-16BE*
UTF-16LE*
UTF-8*
windows-1252
The "*" is shown here to identify charsets that must be supported on all Java platforms.
Another basic question: what is the default charset on my local system? Here's a program that displays the name of the default:
import java.io.*;
import java.nio.charset.*;
public class Encode3 {
public static void main(String args[])
throws IOException {
FileWriter filewriter =
new FileWriter("out");
String encname =
filewriter.getEncoding();
filewriter.close();
System.out.println(
"default charset is: " + encname);
/*
Charset charset1 =
Charset.forName(encname);
Charset charset2 =
Charset.forName("windows-1252");
if (charset1.equals(charset2)) {
System.out.println(
"Cp1252/windows-1252 equal");
}
else {
System.out.println(
"Cp1252/windows-1252 unequal");
}
*/
}
}
When you run this program, you might see a result like this:
default charset is: Cp1252
Notice that this charset is not on the list of required charsets that every Java implementation must support. There is no requirement that the default charset must be one of the required charsets. This example also has some commented-out logic that shows how you can determine whether two charsets are equal or not. It turns out that "windows-1252" and "Cp1252" are in fact names for a single charset. The logic is commented out because there is no requirement that the Cp1252 charset be supported, and so the logic here might not be meaningful to you.
You may have seen other ways to get the default local charset name, such as querying the "file.encoding" system property. This approach might work, but this property is not guaranteed to be defined on all Java platforms.
In the Encode3 program, Charset.forName is used to find the Charset object for a string name such as "US-ASCII". Here's another example that uses this technique:
import java.nio.charset.*;
public class Encode4 {
public static void main(String args[]) {
if (args.length != 1) {
System.out.println(
"missing charset name");
System.exit(1);
}
String charsetname = args[0];
Charset charset;
try {
charset = Charset.forName(charsetname);
System.out.println(
"charset lookup successful");
}
catch (UnsupportedCharsetException exc) {
System.out.println(
"unknown charset: " + charsetname);
}
}
}
If you run the program, like this:
$ java Encode4 XYZ
it will check whether "XYZ" is a supported Charset on the local system, and if so, obtain the Charset object.
Given all this background, how do you actually make use of charsets? Here's a rework of the first example, Encode1:
import java.io.*;
public class Encode5 {
public static void main(String args[])
throws IOException {
FileOutputStream fileoutstream =
new FileOutputStream("out");
Writer writer = new OutputStreamWriter(
fileoutstream, "UTF-8");
writer.write("testing");
writer.close();
}
}
The Encode1 program is not portable. It applies the default charset, which can vary based on platform and locale. By contrast, the Encode5 program uses a standard charset (UTF-8). As mentioned earlier, the default encoding used in the Encode1 example discards the high byte of Java characters. Using the UTF-8 encoding solves this problem. If you change the output line in the Encode program from:
writer.write("testing");
to:
writer.write("testing\u1234");
it still works. And UTF-8 has the advantage of handling 7-bit ASCII in a graceful way.
Here's another example. It shows how you can convert Java strings to byte vectors, specifying an encoding:
import java.io.*;
public class Encode6 {
public static void main(String args[])
throws UnsupportedEncodingException {
String str = "testing";
byte bytevec1[] = str.getBytes();
byte bytevec2[] = str.getBytes("UTF-16");
System.out.println("bytevec1 length = " +
bytevec1.length);
System.out.println("bytevec2 length = " +
bytevec2.length);
}
}
The output on your system should look something like this:
bytevec1 length = 7
bytevec2 length = 16
The first conversion applies the default charset. The second conversion uses the UTF-16 charset.
There's one final thing to discuss about character encodings. You might wonder what a typical mapping or encoding algorithm really looks like. Here is some actual code taken from DataOutputStream.writeUTF. It's used to map a character vector into a byte vector:
for (int i = 0; i < strlen; i++) {
c = charr[i];
if ((c >= 0x0001) & (c <= 0x007F)) {
bytearr[count++] = (byte) c;
}
else if (c > 0x07FF) {
bytearr[count++] =
(byte) (0xE0 | ((c > 12) & 0x0F));
bytearr[count++] =
(byte) (0x80 | ((c > 6) & 0x3F));
bytearr[count++] =
(byte) (0x80 | ((c > 0) & 0x3F));
}
else {
bytearr[count++] =
(byte) (0xC0 | ((c > 6) & 0x1F));
bytearr[count++] =
(byte) (0x80 | ((c > 0) & 0x3F));
}
}
Characters are taken from charr, converted into 1-3 bytes, and written into bytearr. Characters in the range 0x1 - 0x7f (7-bit ASCII) are mapped into themselves. Characters with value 0x0 and in the range 0x80 - 0x7ff are mapped into two bytes. All other characters are mapped into three bytes.
For more information about charsets and encodings, see section 9.7.1, Character Encodings, in "The JavaTM Programming Language Third Edition" by Arnold, Gosling, and Holmes. Also see the documentation for Supported Encodings and Charset. The document Unicode Transformation Formats: UTF-8 & Co. is another good place to learn about charsets and encodings.
back to top


USING REFLECTION TO CREATE CLASS INSTANCES

Imagine that you're doing some Java programming, and you need to create a new instance of the A class. You write some code like this:
A aref = new A();
Pretty obvious, right?
Suppose, however, you take a step further and specify that the name of the class is found in a string made available at run time. It's still possible to proceed, like this:
String classname; // can be either A, B, or C
A aref = null;
B bref = null;
C cref = null;
if (classname.equals("A"))
aref = new A();
else if (classname.equals("B"))
bref = new B();
else
cref = new C();
This code works, but it's cumbersome. Also, it can't be expanded much further without major effort.
There's another approach that works much better in this kind of situation. The basic idea is that you use Class.forName to obtain a java.lang.Class object for a class whose string name you specify. java.lang.Class is a class whose instances represent Java types, such as classes and interfaces and arrays. After you obtain a java.lang.Class instance, you can call newInstance to create a new object of the class represented by the java.lang.Class instance. The code looks like this:
Class cls = Class.forName(classname);
Object obj = cls.newInstance();
This sequence creates an object of the class whose string name is classname.
After you have a java.lang.Class instance, you can also find out other things about the represented class, for example, what methods and fields it contains. You can look up methods by name, and use reflection to call these methods.
Let's look at an example to make these ideas a little more concrete. The example uses java.lang.Class and reflection to implement a class and method exerciser. The idea is that you have some classes and methods, and you'd like to write a driver program to test them. For example, for this input:
$ java NewDemo A string1 string2 @ f2 string3 string4 string5
the driver creates an object of class A, using string1/string2 as string arguments to the A constructor. The driver then calls A.f2 for the created object, using string3/string4/string5 as arguments to the f2 method.
Note that the driver program doesn't know anything about the A class. It's written in a general way to work with any class. The driver looks up and manipulates class and method names using java.lang.Class and reflection.
Here's what the code looks like:
import java.lang.reflect.*;
public class NewDemo {
Class cls;
Object obj;
Constructor ctor;
Object ctorargs[];
Method meth;
Object methargs[];
String args[];
int divpos;
// parse input of the form:
//
// classname arg1 arg2 ...
// @ methodname arg1 arg2 ...
public NewDemo(String a[]) throws
ClassNotFoundException,
NoSuchMethodException {
args = a;
// search for @ divider in input
divpos = -1;
for (int i = 0; i < args.length; i++) {
if (args[i].equals("@")) {
divpos = i;
break;
}
}
if (divpos < 1 || divpos + 1 == args.length) {
throw new IllegalArgumentException(
"bad syntax");
}
// load appropriate class
// and get Class object
String classname = args[0];
cls = Class.forName(classname);
// find the constructor,
// if arguments specified for it
if (divpos > 1) {
Class ptypes[] = new Class[divpos - 1];
for (int i = 0; i < ptypes.length; i++) {
ptypes[i] = String.class;
}
ctor = cls.getConstructor(ptypes);
// set up the constructor arguments
ctorargs = new Object[divpos - 1];
for (int i = 0; i < ctorargs.length; i++) {
ctorargs[i] = args[i+1];
}
}
// find the right method
String methodname = args[divpos + 1];
int firstarg = divpos + 2;
Class ptypes[] =
new Class[args.length - firstarg];
for (int i = 0; i < ptypes.length; i++) {
ptypes[i] = String.class;
}
meth = cls.getMethod(methodname, ptypes);
// set up the method arguments
methargs = new Object[ptypes.length];
for (int i = 0; i < methargs.length; i++) {
methargs[i] = args[firstarg + i];
}
}
// create an object of the specified class
public void createObject() throws
InstantiationException,
IllegalAccessException,
InvocationTargetException {
// if class has no-arg constructor,
// use it
if (ctor == null) {
obj = cls.newInstance();
}
// otherwise use constructor with arguments
else {
obj = ctor.newInstance(ctorargs);
}
}
// call the method and display its return value
public void callMethod() throws
IllegalAccessException,
InvocationTargetException {
Object ret = meth.invoke(obj, methargs);
System.out.println("return value: " + ret);
}
public static void main(String args[]) {
// create a NewDemo instance
// and call the method
try {
NewDemo nd;
nd = new NewDemo(args);
nd.createObject();
nd.callMethod();
}
// display any resulting exception
catch (Exception e) {
System.out.println(e);
System.exit(1);
}
}
}
Here is a test class you can use with the demo:
public class A {
public A() {
System.out.println("call: A.A()");
}
public A(String s1, String s2) {
System.out.println(
"call: A.A(" + s1 + "," + s2 + ")");
}
public void f1() {
System.out.println("call: A.f1()");
}
public double f2(
String s1, String s2, String s3) {
System.out.println("call: A.f2(" + s1 + "," + s2 +
"," + s3 + ")");
return 12.34;
}
}
You need to compile this class in the usual way.
The NewDemo constructor is used to parse the input line, to find the java.lang.Class object for the specified class, and to find the appropriate constructor and method. Then createObject is called to create an instance of the class. Finally, callMethod is used to actually call the method for the class instance.
The constructor and method are found by creating a java.lang.Class vector that contains the types of each parameter to the constructor or method. This example takes the liberty of assuming that all parameters are of String type, and thus the corresponding java.lang.Class object is "String.class". Then getConstructor and getMethod are used to find the actual constructor or method to use.
If you run the driver, by saying:
java NewDemo A @ f1
The output is:
call: A.A()
call: A.f1()
return value: null
Here are additional driver runs:
java NewDemo A @ f2 str1 str2 str3
java NewDemo A str4 str5 @ f1
java NewDemo A str6 str7 @ f2 str8 str9 str10
And here are their respective results:
call: A.A()
call: A.f2(str1,str2,str3)
return value: 12.34
call: A.A(str4,str5)
call: A.f1()
return value: null
call: A.A(str6,str7)
call: A.f2(str8,str9,str10)
return value: 12.34
Some examples of driver runs with bad input are:
java NewDemo
java NewDemo A
java NewDemo A @
java NewDemo A str1 @ f1
java NewDemo A @ f1 str1
java NewDemo B @ f1
java NewDemo A str11 str12 @
java NewDemo A @ f3 str1
The results are:
java.lang.IllegalArgumentException: bad syntax
java.lang.IllegalArgumentException: bad syntax
java.lang.IllegalArgumentException: bad syntax
java.lang.NoSuchMethodException
java.lang.NoSuchMethodException: f1
java.lang.ClassNotFoundException: B
java.lang.IllegalArgumentException: bad syntax
java.lang.NoSuchMethodException: f3
The techniques illustrated here are extremely powerful, and allow you to manipulate types and methods by name at run time. These techniques are used by tools such as interpreters, debuggers, and object exercisers.
For more information about using reflection to create class instances see section 11.2.1, The Class class, and section 11.2.6, The Method Class, in "The JavaTM Programming Language Third Edition" by Arnold, Gosling, and Holmes.
back to top


Reader Feedback

Top of Form
Very worth reading Worth reading Not worth reading
If you have other comments or ideas for future technical tips, please type them here:


Have a question about JavaTM programming? Use Java Online Support.
Bottom of Form
Bottom of Form
back to top


IMPORTANT: Please read our Terms of Use, Privacy, and Licensing policies:



Comments? Send your feedback on the Core JavaTM Technologies Tech Tips to:
Subscribe to other Java developer Tech Tips:
- Enterprise Java Technologies Tech Tips. Get tips on using enterprise Java technologies and APIs, such as those in the Java 2 Platform, Enterprise Edition (J2EETM).
- Wireless Developer Tech Tips. Get tips on using wireless Java technologies and APIs, such as those in the Java 2 Platform, Micro Edition (J2METM).
To subscribe to these and other JDC publications:
- Go to the JDC Newsletters and Publications page, choose the newsletters you want to subscribe to and click "Update".
- To unsubscribe, go to the subscriptions page, uncheck the appropriate checkbox, and click "Update".
ARCHIVES: You'll find the Core Java Technologies Tech Tips archives at:

Copyright 2003 Sun Microsystems, Inc. All rights reserved.
901 San Antonio Road, Palo Alto, California 94303 USA.
This document is protected by copyright. For more information, see:

Sun, Sun Microsystems, Java, Java Developer Connection, J2SE, J2EE, and J2ME are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.