UTF-16 Item

From GM-RKB
(Redirected from 16-bit Unicode Character)
Jump to navigation Jump to search

See: Text Item, UTF-16 Standard, Unicode, Java char Variable, Unicode Text File.



References

2011

  • http://en.wikipedia.org/wiki/UTF-16
    • QUOTE: UTF-16 (16-bit Unicode Transformation Format) is a character encoding for Unicode capable of encoding 1,112,064[1] numbers (called code points) in the Unicode code space from 0 to 0x10FFFF. It produces a variable-length result of either one or two 16-bit code units per code point.

      The older UCS-2 (2-byte Universal Character Set) is a similar character encoding that was superseded by UTF-16 in version 2.0 of the Unicode standard in July 1996.[2] It produces a fixed-length format by simply using the code point as the 16-bit code unit and produces exactly the same result as UTF-16 for 96.9% of all the code points in the range 0-0xFFFF, including all characters that had been assigned a value at that time.

      UTF-16 is officially defined in Annex C of the international standard ISO/IEC 10646[3]. It is also described in "The Unicode Standard" version 2.0 and higher, as well as in the IETF's RFC 2781.

  1. [math]\displaystyle{ 2^{16} - 2 \times 2^{10} + 2^{10} \times 2^{10} }[/math], where [math]\displaystyle{ 2^{16} }[/math] is the BMP, [math]\displaystyle{ - 2 \times 2^{10} }[/math] is the interval U+D800–U+DFFF, and [math]\displaystyle{ 2^{10} \times 2^{10} }[/math] are the higher planes.
  2. "Questions about encoding forms". http://www.unicode.org/faq//utf_bom.html. Retrieved 12 November 2010. 
  3. ISO/IEC 10646-1:2000(E), pp. 890-892; ISO/IEC 10646:2003(E), pp. 1364-1366; ISO/IEC 10646:2012(E) Final Committee Draft (FCD), p. 2208; The FCD contains a reference to clauses 9 and 10, pp. 15-17.