Telecommunications Journal | A Comparative Study of UTF-8, UTF-16, and UTF-32 of Unicode Code Point

The IUP Journal of Telecommunications

A Comparative Study of UTF-8, UTF-16, and UTF-32 of Unicode Code Point

Article Details

Pub. Date	:	May, 2012
Product Name	:	The IUP Journal of Telecommunications
Product Type	:	Article
Product Code	:	IJTC51205
Author Name	:	Sanjeev Kumar
Availability	:	YES
Subject/Domain	:	Science & Technology
Download Format	:	PDF Format
No. of Pages	:	11

Price

For delivery in electronic format: Rs. 50;
For delivery through courier (within India): Rs. 50 + Rs. 25 for Shipping & Handling Charges

Download

To download this Article click on the button below:

Abstract

Unicode is a critical enabling technology for developers who want to internationalize applications for global environments. Unicode assigns a unique number for every character, irrespective of what the platform, or the program, or the language is. The Unicode Standard has been adopted in the industry by Apple, HP, IBM, Microsoft, Oracle, SAP, Sun, Sybase, and many others. Unicode is required by modern standards such as XML, Java and WML, and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode standard, and the availability of tools supporting it, is among the most significant recent global software technology advances. Each available format of UTF-8, UTF-16 and UTF-32 has its own pros and cons. The comparison of the following three formats is discussed in this paper.

Description

Unicode is the first truly successful multilingual character set standard (Ken Lunde, 2008). It enables a single software product or website to be targeted across multiple platforms, languages and countries without reengineering. It codes more than a million characters and includes code points for all the characters of popular languages of the world. Each character is assigned a unique number called Unicode code point. It uses hexadecimal numbers to represent code points. For example, Latin Upper ‘A’ is given a code point U+0041 in hexadecimal or 0065 in decimal. It is the super set of all earlier character set codes available. ASCII can represent only 128 characters, which include only Latin upper and lower characters and some other special characters. Universal Character Set (UCS) can represent only 256 characters; first 128 characters are the same as in ASCII and other 128 characters are used to represent other European languages.

The computer industry has adopted Unicode and given a global outlook in building international software that can be easily adapted to meet the needs of particular locations and cultures.1 Incorporating Unicode into client-server or multitiered applications and websites offers significant cost savings over the use of legacy2 character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without modification. It allows data to be transported through many different systems without corruption, which means software products developed earlier can be converted into Unicode without losing data. The standard includes the European alphabetic scripts, Middle Eastern right-to-left scripts, and Asian and African scripts.

Keywords

Telecommunications Journal, Speech Noise Elimination, Traditional Spectrum Subtraction, Communication Systems, Speech Processing Algorithms, Signal Processing, Noise Reduction Techniques, Voice Activity Detection, Spectrum Subtraction Method, Signal to Noise Ratios, Cockpit Voice Recorder, Musical Noise.