Java

Unicode system in Java

Programming languages follow a set of standards for character encoding. These standards represent written languages and define some rules that must be followed in order to encode characters belonging to those written languages. Just like other programming languages, Java also has a character encoding standard which is referred to as Unicode System. This post throws light on the Java Unicode System.

What is a Unicode System?

Unicode system is a worldwide standard used to encode 16-bit characters. This system can represent almost any renowned language of the world.

Why Unicode System?

Before the emergence of the Unicode system, there were numerous standards used for encoding characters. These were:

  1. ASCII
    ASCII, short for American Standard Code for Information Interchange is one of the oldest and most common standards for encoding characters and includes letters A-Z (uppercase and lowercase both) and number 0-9, and some basic symbols.
  2. ISO 8859-1
    ISO 8859-1 is a standard that was developed for the Western European Language that includes 128 ASCII characters as well as 128 additional characters.
  3. KOI-8
    KOI-8 is a standard originally developed for Russian that enables encoding of 8-bit characters and includes Latin alphabets and Russian alphabets (uppercase and lowercase both).
  4. GB 18030 and BIG-5
    GB 18030 and BIG-5 are standards that were developed for the Chinese. GB18030 represents all 20,902 Han characters and additional DBCS symbols, meanwhile, Big5, represents conventional chinese characters.

In the above-mentioned standards, the problem that occurred was that a specific code value was used to represent various characters in multiple languages. Moreover, larger character set encoding for various languages varying lengths such as 1 byte, 2 bytes, or more.

So in order to solve this problem Unicode system for languages was developed. Each character in this system hold 2 byte, therefore, in java 2 byte is used for each character.

Conclusion

Unicode system is a global standard that is used for character encoding of 16-bit characters. It originated as a solution to the problems that occurred in previously developed language standards. Java uses this system that is designed to hold 2 byte for each character. This post discusses the Java Unicode System in depth.

About the author

Naima Aftab

I am a software engineering professional with a profound interest in writing. I am pursuing technical writing as my full-time career and sharing my knowledge through my words.