IUP Publications Online
Home About IUP Magazines Journals Books Archives
     
A Guided Tour | Recommend | Links | Subscriber Services | Feedback | Subscribe Online
 
The IUP Journal of Telecommunications
A Comparative Study of UTF-8, UTF-16, and UTF-32 of Unicode Code Point
:
:
:
:
:
:
:
:
:
 
 
 
 
 
 

Unicode is a critical enabling technology for developers who want to internationalize applications for global environments. Unicode assigns a unique number for every character, irrespective of what the platform, or the program, or the language is. The Unicode Standard has been adopted in the industry by Apple, HP, IBM, Microsoft, Oracle, SAP, Sun, Sybase, and many others. Unicode is required by modern standards such as XML, Java and WML, and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode standard, and the availability of tools supporting it, is among the most significant recent global software technology advances. Each available format of UTF-8, UTF-16 and UTF-32 has its own pros and cons. The comparison of the following three formats is discussed in this paper.

 
 

Unicode is the first truly successful multilingual character set standard (Ken Lunde, 2008). It enables a single software product or website to be targeted across multiple platforms, languages and countries without reengineering. It codes more than a million characters and includes code points for all the characters of popular languages of the world. Each character is assigned a unique number called Unicode code point. It uses hexadecimal numbers to represent code points. For example, Latin Upper ‘A’ is given a code point U+0041 in hexadecimal or 0065 in decimal. It is the super set of all earlier character set codes available. ASCII can represent only 128 characters, which include only Latin upper and lower characters and some other special characters. Universal Character Set (UCS) can represent only 256 characters; first 128 characters are the same as in ASCII and other 128 characters are used to represent other European languages.

The computer industry has adopted Unicode and given a global outlook in building international software that can be easily adapted to meet the needs of particular locations and cultures.1 Incorporating Unicode into client-server or multitiered applications and websites offers significant cost savings over the use of legacy2 character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without modification. It allows data to be transported through many different systems without corruption, which means software products developed earlier can be converted into Unicode without losing data. The standard includes the European alphabetic scripts, Middle Eastern right-to-left scripts, and Asian and African scripts.

 
 

Telecommunications Journal, Speech Noise Elimination, Traditional Spectrum Subtraction, Communication Systems, Speech Processing Algorithms, Signal Processing, Noise Reduction Techniques, Voice Activity Detection, Spectrum Subtraction Method, Signal to Noise Ratios, Cockpit Voice Recorder, Musical Noise.