May 15, 2005

Forget public static final String, Enums are here!

I mentioned in an earlier post that I'd post some examples on J2SE 5.0 real world usage. It's taken awhile, but for some time now, I've been using the latest Java in two separate projects. While not strictly meant for this purpose, Enums are very handy for avoiding typos in repetitive public static final String SOMESTRINGCONSTANT clauses. Instead of doing this, you could do an Enum of it, and never need to type in an actual string value. Since Enum is really a special class, you can just use toString() instead, i.e. define Enum SOMESTRING {value}, and in code, use as SOMESTRING.value.toString();

This has two benefits: You avoid writing the same or almost the same string twice and (public static final String NODENAME_TOYS = "toys";) and Enums can be checked compile time, so it's not possible to make a spelling mistake in the value of your constant.

A common case where you can use this pattern is when parsing XML. While you can and should have a schema for your XML format, your application logic still needs to understand the XML tree and process different nodes differently. So instead of making the nodenames constants, you could define an Enum for them. For example:
public Enum ToysAllowedNodes{Car, Ball, Scooter, Figures} and then use ToysAllowedNodes.values array to iterate over the nodes for node-specific processing.

Posted by thoughts at 04:52 PM | Comments (0) | TrackBack

May 02, 2005

Quick and Dirty Hack for UTF-8 Support in ResourceBundle

I just don't get why Sun folks didn't fix this in J2SE 1.5. By specification, PropertyResourceBundles, or more exactly, the Properties files are Latin-1 (i.e. ISO 8859-1) encoded: "When saving properties to a stream or loading them from a stream, the ISO 8859-1 character encoding is used. For characters that cannot be directly represented in this encoding, Unicode escapes are used; however, only a single 'u' character is allowed in an escape sequence. The native2ascii tool can be used to convert property files to and from other character encodings. ". However, since all Latin-1 characters are in the same position in UTF-8 encoding, I don't see a reason why they couldn't have just added support for UTF-8 into the Properties class.

While PropertyResourceBundle only has an implicit reference to the Properties class, the problem is an overall bad design of ResourceBundle class hierarchy. The super class ResourceBundle has two responsibilities: it acts both as a super class and as a factory for loading ResourceBundles. The ResourceBundle handles loading of PropertyResourceBundles that inherit from ResourceBundle, and you can already smell a problem with this suspicous implementation. Generally, the superclass should never need to know anything about child classes implementing it. The getBundle() methods in it are defined as final so there's no way to replace the the default implementation of PropertyResourceBundle. Sun has two answer to this problem: either use native2ascii tool to encode all double-byte characters in your Properties file or implement your own ResourceBundle class.

Using native2ascii by hooking it up with your Ant build as a task is fine, but when you are developing and adding UTF-8 strings into your Properties file, it's just an extra burden to run native2ascii after every change. On Sun's forums, Craig McClanahan discusses how you could use your own ResourceBundle class instead of Properties files to resolve the encoding problem. But the issue with custom ResourceBundle classes is that they are inherently different from PropertiesResourceBundle; you would need a custom class per each locale you are supporting. Since ResourceBundle class handles loading of the PropertyResourceBundles and the methods are marked final, you are stuck with the Latin-1 encoding if you want to use Property files.

The whole problem is stupid. Properties files should have supported UTF-8 in the first place, but the change to support them could have been made at any time after. Assuming UTF-8 as encoding when reading Latin-1 encoded file wouldn't have broken anything: this backwards compatibility is the basic reason why UTF-8 is so popular. All is not lost though; you could just use your own ResourceBundle factory class for loading ResourceBundles and then implement a UTF-8 PropertyResourceBundle class wrapper for UTF-8 support. Here's a quick and dirty hack to do just that:


import java.io.UnsupportedEncodingException;
import java.util.Enumeration;
import java.util.Locale;
import java.util.PropertyResourceBundle;
import java.util.ResourceBundle;

public abstract class Utf8ResourceBundle {

public static final ResourceBundle getBundle(String baseName) {
  ResourceBundle bundle = ResourceBundle.getBundle(baseName);
  return createUtf8PropertyResourceBundle(bundle);
}

public static final ResourceBundle getBundle(String baseName, Locale locale) {
  ResourceBundle bundle = ResourceBundle.getBundle(baseName, locale);
  return createUtf8PropertyResourceBundle(bundle);
}

public static ResourceBundle getBundle(String baseName, Locale locale, ClassLoader loader) {
  ResourceBundle bundle = ResourceBundle.getBundle(baseName, locale);
  return createUtf8PropertyResourceBundle(bundle);
}

private static ResourceBundle createUtf8PropertyResourceBundle(ResourceBundle bundle) {
  if (!(bundle instanceof PropertyResourceBundle)) return bundle;

  return new Utf8PropertyResourceBundle((PropertyResourceBundle)bundle);
}

private static class Utf8PropertyResourceBundle extends ResourceBundle {
  PropertyResourceBundle bundle;

  private Utf8PropertyResourceBundle(PropertyResourceBundle bundle) {
    this.bundle = bundle;
  }
  /* (non-Javadoc)
   * @see java.util.ResourceBundle#getKeys()
   */
  public Enumeration getKeys() {
    return bundle.getKeys();
  }
  /* (non-Javadoc)
   * @see java.util.ResourceBundle#handleGetObject(java.lang.String)
   */
  protected Object handleGetObject(String key) {
    String value = (String)bundle.handleGetObject(key);
    try {
      return new String (value.getBytes("ISO-8859-1"),"UTF-8") ;
    } catch (UnsupportedEncodingException e) {
      // Shouldn't fail - but should we still add logging message?
      return null;
    }
  }

}
}

Above, I've implemented Utf8PropertyResourceBundle as an inner class, but of course you could implement it as a public type if you wanted to use it explicitly. If you look at its handleGetObject method, the byte conversion to UTF-8 is really the only thing these classes are doing, and the thing that Sun missed in their implementation of PropertyResourceBundle.

Posted by thoughts at 11:37 AM | Comments (5) | TrackBack